Brand Visibility in the AI Era: From Infrastructure to Content Understanding – Lessons from Pawel Sokolowski presentation during SEMKRK26
Paweł Sokołowski, CEO of CONTADU and NEURONwriter, recently had the pleasure of sharing some insights at the SEMKRK26 conference. The topic addressed – “Brand visibility. Technicalities in the AI era.” – sparked many questions and a lively discussion, which only confirms its importance in today’s rapidly changing world of SEO and content marketing.
Today I would like to expand on these key aspects that determine whether your brand will be noticed and understood by artificial intelligence.
The market often asks: “Are we visible in AI?”. This is a natural question, but the answer is not as simple as it might seem. Visibility in AI is the end result, not the starting point. To achieve it, we must build solid foundations, layer by layer. In my presentation I highlighted three key pillars that determine your site’s readiness for the AI era: Accessibility (Infrastructure), Rendering (Content visibility), and Extraction and Understanding (Authority). Let’s go through each of them to understand how to ensure your brand has an optimal position in the new AI ecosystem.
Pillar 1: Accessibility and Infrastructure – Can AI Bots Even Knock on Our Door?
Before any language model (LLM) can quote your content, it must first have access to it. This seems obvious, but in practice it is the first and often most important barrier. Every AI crawler must pass through three gates before it reads a page: the Policy Gate (robots.txt), the Infrastructure Gate (WAF / CDN / Bot Management), and the Visibility Gate (Content Rendering).
Policy Gate: A Declaration of Good Will (robots.txt)
The robots.txt file is the first line of defense, but we must remember that it is merely a declaration of good will. It specifies which bots may crawl the site and which paths are available to them. If a bot’s User-Agent is not blocked by a Disallow rule, the bot proceeds further. However, as my research has shown, the market is full of bots that ignore these rules.
Infrastructure Gate: Real Protection (WAF / CDN)
This is where real control begins. The server-side firewall (WAF) analyzes every request. It may return a 200 (OK) code with real content, but it may also return a 403 (Forbidden) or serve a page with a JavaScript challenge (JS challenge). It is at this level that real protection takes place against bots that do not respect robots.txt.
It is worth noting that 71.5% of web traffic is bots, and as much as 89.4% of AI crawler traffic is training or mixed traffic, not search traffic. Moreover, there are more than 200 undeclared crawlers detected by Cloudflare that rotate IP addresses and User-Agents, evading simple rules. This is why robots.txt alone is not enough. An accessibility audit must check the response at the WAF level.
AI Accessibility Strategies
Companies adopt different strategies toward AI bot traffic:
Strategy: Fully Open
Description: All AI bots have access. No blocks in robots.txt or WAF. Content served in raw HTML (SSR).
Strategy: Selective
Description: robots.txt blocks only training crawlers (e.g., GPTBot, CCBot), while allowing access to search bots (e.g., ClaudeBot, PerplexityBot).
Strategy: Bot Management
Description: robots.txt allows everything, but most bots receive HTTP 200 with a JS challenge. Bots may receive different content than humans.
Strategy: Total Block (WAF)
Description: WAF returns 403 for almost all AI bots. No rules in robots.txt. Bots receive no information.
Choosing the right strategy depends on company policy and whether the priority is data protection or maximizing visibility.
Pillar 2: Rendering – What Does an AI Bot Actually See?
Passing through the robots.txt and WAF gates is only half the battle. The next, equally critical stage is content rendering. The HTTP 200 response code only tells us that the bot received a response from the server. It does not, however, tell us what it actually saw. This is the key difference between “the bot entered” and “the bot saw”.
SSR vs CSR: The Key to Visibility
- SSR (Server-Side Rendering): With SSR, content is generated on the server and delivered to the browser (or bot) as complete, ready-to-read HTML code. This means that H1 headings, navigation, and structured data (schema.org) are present in the raw HTML. The bot sees everything immediately, without needing to execute JavaScript. This is the optimal solution for visibility in AI.
- – CSR (Client-Side Rendering): With CSR, the server delivers minimal HTML to the browser (or bot), and most content is loaded and rendered dynamically using JavaScript. For many AI bots that do not fully execute JavaScript, or do not at all, a CSR page may look like a blank page, even though they received a 200 code. This leads to a situation where the bot “entered” but “saw nothing”.
Conclusion: Only server logs combined with rendering analysis provide a full answer to the question of what a given bot actually received and what was rendered. An audit must compare the raw HTML with the post-render version to ensure that content is accessible to AI. At NEURONwriter, our AI Readiness Audit helps identify these issues, checking whether your site is technically prepared for AI crawlers, including in terms of rendering.
Pillar 3: Extraction and Understanding – How Does AI Interpret Your Content?
Even if an AI bot has full access to your site and renders it correctly, this still does not guarantee that it will understand your content in a way that allows it to use it effectively. This is where the matter of extraction and understanding comes in – that is, how AI interprets and processes information so that it can recommend it.
BLUF, FAQ and Schema.org: Making AI’s Job Easier
AI models, like people, value clarity and organization. To make it easier for them to extract key information, we should focus on a few elements:
- BLUF (Bottom Line Up Front): This principle assumes that the most important information should be presented at the beginning of the text. AI often aims to quickly deliver a concise answer, which is why key conclusions and answers to questions should be easily accessible and visible right at the beginning of an article or section.
- – FAQ (Frequently Asked Questions): FAQ sections are extremely valuable for AI. They provide direct questions and answers that models can easily process and use in their generated responses. This is an excellent way to control which specific information AI “pulls out” of your site.
- – Schema.org (Structured Data): Implementing structured data such as Article, FAQPage, Product, or Review is a direct way of communicating with AI. Schema.org helps models understand the context and relationships between elements on a page, which significantly improves their ability to extract and interpret content. As a result, AI can quote your information more precisely and present it in a more organized way.
At NEURONwriter, our AI Score evaluates content against these dimensions, helping to create materials that are not only readable for humans but also easily digestible and understandable for artificial intelligence. This is the key to making AI not only see your content, but also understand it and be able to recommend it.
Pillar 4: Authority and E-E-A-T – Building Trust in the Eyes of AI
The final, but equally important, pillar of visibility in AI is authority and alignment with the principles of E-E-A-T (Experience, Expertise, Authoritativeness, Trustworthiness). Even if your content is accessible, correctly rendered, and easy to extract, AI must consider it a credible and valuable source in order to recommend it.
Trust Signals for AI
AI, like people, looks for signals that confirm the quality and credibility of information. Here are the key elements that AI models pay attention to:
- Content Quality: Is the content substantive, comprehensive, and free of errors? AI is getting better and better at assessing linguistic and factual quality.
- – Authorship and Expertise: Is the content created by experts in the given field? Are the authors clearly identified and do they possess documented knowledge? This is key for Expertise and Authoritativeness.
- – Topic and Context: Is the site thematically consistent? Is the content embedded in the appropriate context, which builds authority in a given niche?
- – Citations and Links: Is your site cited by other credible sources? Does it have valuable backlinks? These are traditional authority signals that still matter for AI.
- – Freshness and Relevance: Is the content up to date and does it respond to users’ current needs? AI prefers fresh and relevant information.
Building authority is a long-term process that requires consistency and attention to every detail. At NEURONwriter, through our content optimization tools, we help create materials that naturally meet E-E-A-T criteria, increasing the chances that your brand will be recognized as an authoritative source by AI models.
Summary and the Future of Visibility in AI
Brand visibility in the AI era is a complex process that requires a strategic approach on many levels. It is no longer enough to optimize only for traditional search engine algorithms. We must think about how our content is accessible to bots, how it is rendered, and finally – how it is understood and assessed in terms of authority by AI models.
Tools such as NEURONwriter are designed to support you at each of these stages, from the AI readiness audit, through content optimization for AI Score, to monitoring visibility in LLMs. The future of SEO is “SEO for AI”, and those who understand and adapt to these changes earliest will gain a significant advantage. Remember that in the world of AI, just as in traditional SEO, quality, context, and trust matter. Take care of these elements, and your brand will be ready for the challenges and opportunities that the era of artificial intelligence brings.
Below are the answers to your questions that came up during the presentation.
Frequently Asked Questions (Q&A from SEMKRK26)
Which tool should you use to check positions and mentions in LLMs?
To check positions and mentions in LLMs, the key is a tool that offers AI visibility monitoring. At NEURONwriter we developed the AI Visibility module, which is built specifically for this. It includes features such as: Overview (a dashboard with visibility metrics), Opportunities (identifying tasks that improve visibility), Competitors (analysis of who is winning visibility in AI answers), AI Readiness Audit (checking the site’s technical readiness for AI crawlers), and Monitored Discussions (tracking specific questions and prompts in the context of a brand). Additionally, to verify brand mentions, especially in the AI context, we have features for monitoring mentions/citations in results generated by LLMs.
Do you think the Contadu tool will not lose value as AI develops?
Absolutely not. I believe that Contadu, just like NEURONwriter, will gain in value as AI develops. Our approach has always focused on delivering quality. Content Intelligence and Semantic SEO are foundations that become even more important in the AI era. The development of AI does not depreciate the need for a deep understanding of content, user intent, and optimization for complex algorithms. AI creates new opportunities for tools like Contadu to deliver even more precise analyses, recommendations, and automations that help content creators stand out from the digital noise. The key is constant adaptation and integration of new AI technologies, which is our priority.
Which factors do you look at first when auditing content for AI? Do you weight factors in terms of AI and SEO, or is AI more important to you?
When auditing content for AI, I pay attention to several key factors that are at the same time strongly linked to SEO. I do not weight AI as more important than SEO, because in my philosophy good SEO equals good AI visibility. They are interconnected. Here are the most important factors:
Topic Coverage: Does the content exhaust the topic comprehensively? AI values depth and authoritativeness.
Structure & Clarity: Is the content logically organized, easy to scan and understand? Headings, lists, paragraphs – all of this helps AI extract information.
BLUF (Bottom Line Up Front): Is the most important information presented at the beginning? AI often prefers quick and concise answers, so key conclusions should be easily accessible.
Fact & Source Density: Is the content backed by credible data and sources? AI, like users, values reliability and the ability to verify information.
User Intent: Does the content respond to users’ real needs and questions? AI strives to deliver the most relevant answers.
Uniqueness & Added Value: Does the content bring something new, or is it merely a duplication of existing information? AI, although it can synthesize, rewards originality.
NEURONwriter has a built-in AI Score that evaluates these dimensions, helping to create content optimized for both traditional SEO and visibility in AI.
Brand24 or another tool – which do you recommend for verifying a brand and visibility in AI? How do you handle AI bot traffic on a site when analyzing data?
For verifying a brand and visibility in AI, besides the AI Visibility module in NEURONwriter that I mentioned, I definitely recommend Brand24. Their Chatbeat module is a great example of how media monitoring tools are adapting to the new reality, tracking brand mentions in results generated by LLMs. It is crucial to know where and in what context your brand is cited by AI, as well as crucial to know what to improve so that it appears there – and that is where NEURONwriter or CONTADU already help.
As for AI bot traffic on a site, this is a challenge many people face. In data analysis, especially in Google Analytics 4, I try to filter out bot traffic so that it does not distort the real data on user engagement. If I see that some bots generate enormous data transfer (e.g., 10 GB per day) and it does not translate into any value (e.g., visibility in LLMs with source citation), I treat it as a signal to intervene. In such cases I consider blocking those specific bots at the server level or using tools such as Cloudflare.
Will blocking AI robots cause our results to disappear from LLMs?
Generally “Yes”. Blocking AI robots can cause your results to disappear from LLMs. Many LLMs, especially those that aim to deliver current and reliable information, rely on indexing content from the web. If you block their crawlers (e.g., GPTBot, CCBot, Google-Extended), you prevent them from accessing your site, and therefore your content cannot be used to generate answers. This is the dilemma we must face: do we want to be part of the AI ecosystem and potentially gain visibility, or protect our content at all costs, risking the loss of that visibility.
Are you in favor of blocking selected agents/bots as a means of protecting copyright, or do you prefer to let “everything in” to the site?
I am a proponent of selective blocking and access management, rather than letting “everything in”. Protecting copyright and controlling how our content is used is extremely important. Not all bots are equal. Some, like CCBot, are known for collecting data for training purposes without any guarantee of source citation, which can be problematic. Others, like GPTBot or Google-Extended, are more transparent and can contribute to increased visibility by citing your site in LLM answers. My approach is: let in the bots that offer potential value in the form of visibility or traffic, and block those that merely consume resources without clear benefits or that violate our rights. This is a matter of industry, of company policy regarding data protection against AI, or of digital policy in general.
Which blocking method do you consider more effective? Robots.txt or blocking via Cloudflare?
Blocking via Cloudflare (or similar solutions at the CDN/WAF/server level) is definitely more effective than robots.txt. Robots.txt is merely a suggestion for robots, not a hard rule. Malicious bots, or those that ignore directives, will simply ignore it. Blocking at the Cloudflare level allows you to actually cut off access to the site based on IP addresses, user agents, or other signatures before the traffic reaches your server. This gives much greater control and effectiveness in managing bot traffic, especially of those that generate large data transfer.
What evidence is there that AI bots honor the directives in robots.txt?
The evidence that AI bots honor the directives in robots.txt is mixed and depends on the specific bot and its operator. Large technology companies, such as Google (for Google-Extended) or OpenAI (for GPTBot), declare that their bots honor robots.txt. This is consistent with their policies and ethical standards. We can verify this through server log analysis and observation of whether blocked sections of the site are not indexed by these specific bots. However, there are also bots, especially the less transparent ones or those operating in a gray area, that may ignore these directives. That is exactly why I recommend a multi-layered approach, where robots.txt is the first signal, but the ultimate control takes place at the server/CDN level.
Which AI-related features can you recommend in your NEURONwriter tool?
In NEURONwriter we have several key features that are extremely useful in the AI context:
AI Visibility module: As I already mentioned, this is a comprehensive tool for monitoring and improving the visibility of your content in answers generated by LLMs. It allows you to understand how AI perceives your brand and content.
AI Score: A metric that evaluates your content in terms of optimization for AI. It analyzes aspects such as topic coverage, structure, clarity, the BLUF principle, and fact and source density, helping to create content that is “friendly” to AI.
AI Readiness Audit: This feature lets you check whether your site is technically prepared for AI crawlers. It helps identify potential issues that may hinder AI’s access to and processing of your content.
AI Profile: Helps generate consistent project summaries, which is key to maintaining uniformity in AI-generated content.
Integration with AI models: NEURONwriter uses advanced AI models from leading providers, such as OpenAI (GPT), Anthropic (Claude), and Google (Gemini), to generate and optimize content, which ensures access to the latest AI capabilities.
Together, these features help not only create better content, but also strategically manage its visibility in the new AI ecosystem.
Do you implement the llms.txt file? What is your opinion on this topic?
Right now, standards for data exchange between companies and crawlers are being formed. The LLMs file is one of the examples in which global corporations are trying to standardize and structure the exchange of information. We can assume that the LLMS.txt format is the equivalent of sitemap.xml and robots.txt for LLMs. I assume that some crawlers may use it, but due to the lack of standardization it is not a basis for indexing a site. The more publishers implement it, the faster it will become a commonly honored standard, which will make it easier to manage visibility in AI and protect copyright.
If a site has referral traffic from LLMs, does that guarantee it is accessible to them? Can a referral appear in LLMs from other sources (e.g., backlinks)?
If a site has referral traffic from LLMs, this is a strong indicator that it is accessible to them and that your content is being used in generated answers. It means that the bots indexing those LLMs had access to your site and considered it a valuable source of information. This is a very positive signal.
Referral from LLMs usually means that your domain was directly cited or recommended in an answer generated by the model. This can be the result of direct indexing of your site by an LLM bot. However, a referral can also come from other sources, although that is less direct. For example, if your site is frequently cited on other high-authority sites (backlinks), and those sites are indexed by LLMs, then this can indirectly increase the chances that your site will be considered a credible source and mentioned in AI answers, even if the LLM bot did not directly index your site at that specific moment. Nevertheless, a direct referral from an LLM is the best evidence of direct visibility.
In this “bytes transferred”, some bots consume 12 GB of transfer per day! Block them? Not block them? Both options seem bad.
This is a classic dilemma that many site owners face. My approach is pragmatic: if a bot consumes 12 GB of transfer per day and you see no value from it in the form of visibility in LLMs with source citation or valuable traffic, then yes – block it. Neither option is bad if you approach it strategically. Uncontrolled transfer means real costs and server load, which can affect the site’s performance for real users. If a bot does not honor robots.txt and brings no benefit, then blocking it is justified. As I mentioned earlier, I prefer a selective approach. Monitor, analyze, and make decisions based on data, not on fear of losing hypothetical visibility that does not materialize anyway. Here it is good to create an AI communication policy that defines how data is made available to and processed by AI.
Won’t allowing CCBot to crawl result in AI not indicating the source? See: the domain will not appear in the results?
There is such a risk. CCBot is known for collecting data for training purposes, and that data is not necessarily used by the models trained on Common Crawl data to generate answers with source citation. This means that your content may be used to train an AI model, but your domain will not appear in the results as a source. For me, this is a crucial difference. If the goal is visibility and referral traffic, then allowing CCBot without clear guarantees of source citation is problematic. I prefer bots that are more transparent about citation and that actually contribute to the visibility of your brand in LLM answers. In the case of CCBot, if you are not sure about the benefits, I would consider blocking it or limiting its access.
Is it worth fighting for visibility in AI now, when the conversion % from this source is very, very low compared to other channels?
It is true that currently the conversion percentage directly from LLMs may be low compared to traditional channels. However, I believe it is worth fighting for visibility in AI, but from the perspective of a long-term strategy and brand building. AI is the future of information search and interaction with content.
Ignoring this trend is like ignoring SEO at the dawn of Google. Conversions may be low now, but:
Building authority and trust: Being cited by AI builds your brand’s authority as a credible source of information.
Early Adoption: Those who understand and optimize for AI early will gain an advantage when the AI ecosystem matures.
Changing user behavior: Users will increasingly rely on AI answers. Being in those answers means being where your potential customers are.
Indirect benefits: Even if direct conversion is low, visibility in AI can lead to increased brand awareness, direct searches, and the strengthening of other channels.
So yes, it is worth investing in visibility in AI, but with an open mind and the awareness that it is an investment in the future, not always in immediate, direct conversions.
So which bots is it worth allowing, and which not?
My approach is as follows:
Worth allowing:
GPTBot (OpenAI): Usually honors robots.txt and is used to improve OpenAI’s models, which can lead to your content being cited in ChatGPT answers.
Google-Extended (Google): Google’s bot, which is used to train AI models and generate answers in AI Overviews. It honors robots.txt and is key to visibility in the Google ecosystem.
Other transparent bots: Any bots that clearly declare their intentions, honor robots.txt, and offer potential benefits in the form of visibility or referral traffic.
Worth considering blocking/limiting:
CCBot (Common Crawl): As I mentioned, it collects data for training purposes without any guarantee of source citation. If you do not want your content used this way without compensation or visibility, consider blocking it.
Bots generating excessive transfer without value: Any bot that loads the server and generates large data transfer but brings no measurable benefits (no visibility, no traffic) should be blocked.
Malicious bots and scrapers: Of course, any bots that breach security, attempt to steal data, or otherwise act harmfully should be blocked immediately.
The key is monitoring server logs, analyzing user agents, and making informed decisions based on the bots’ behavior and their impact on your site.
And what about AI bots consuming resources?
AI bots consuming resources is a real problem that directly translates into hosting costs and site performance. Large data transfer, server CPU load – all of this can slow down the site for real users and generate unnecessary expenses. That is why active management of bot traffic is so important. We cannot simply ignore this problem.
My strategies are:
Monitoring: Regularly checking server logs and analytics tools for unusual bot traffic and resource consumption.
Identification: Determining which specific bots generate the greatest load.
Value assessment: Do these bots bring any value (visibility, traffic, citations)? If not, they are candidates for blocking.
Selective blocking: Using Cloudflare or server-level rules to block unwanted bots. Robots.txt is only the first step, but it is not enough.
Site optimization: Making sure the site is optimized for performance, so that even with increased bot traffic, real users do not experience slowdowns.
This is an ongoing process that requires attention and proactive action.
Why don’t you discuss token costs for AI models in the presentation? Don’t you take them into account when working with them?
That is a very apt question! Indeed, in many presentations, especially those focused on strategy and visibility, I do not always discuss token costs in detail. This does not mean, however, that I do not take them into account – on the contrary, token costs are an absolutely key element in working with AI models, especially in the context of scaling operations and building products such as NEURONwriter or Contadu.
The reasons I may not discuss them in presentations are usually the following:
Focus on strategy and business value: Presentations often aim to present a broader vision, strategy, and the value that AI brings to SEO and content marketing. Technical and cost details, although important, can distract from the main message.
Variability and complexity: Token costs are very dynamic – they change depending on the model, the provider, the length of prompts, and the generated answers. Discussing them in detail in a presentation could quickly become outdated or too complicated for a broad audience.
Product-level optimization: In NEURONwriter and Contadu, token cost optimization is built into the product architecture. We constantly work on efficiently using the API, caching answers, selecting the most cost-effective models for a given task, and minimizing unnecessary queries. This is our work “under the hood”, so that the end user can focus on value rather than on managing tokens.
Different models, different uses: We use different AI models (GPT, Claude, Gemini) depending on the task. Each of them has a different cost structure and is optimal for different applications. In a presentation it would be difficult to discuss all of this in detail without going into very technical specifics.
