Why 81% of Your AI Traffic Comes From ChatGPT (And How to Show Up There)

GPTBot accounts for 81% of all AI crawler activity on the web. If your site is not optimized for it, you are invisible to four out of five AI searches. Here is what GPTBot looks for and the five changes to make today.

The 81% number, and why it matters

In February 2026, website platform Duda analyzed 858,457 sites and tracked 68.9 million AI crawler visits across them. The headline finding, covered by Roger Montti in Search Engine Journal, was stark: GPTBot alone accounted for 81.0% of all AI crawler activity, or 55.8 million of those 68.9 million visits. ClaudeBot was next at 16.6% (11.5M). PerplexityBot (1.8%) and Google's Gemini-associated crawlers (0.6%) were rounding errors by comparison.

If you are planning your AI search strategy around "every engine equally", the data says otherwise. ChatGPT is where your AI-visible audience already is. Your site either shows up in GPTBot's pool or it does not, and that decision is mostly made by a handful of technical signals you can control this week.

This post covers three things: why GPTBot dominates, what it actually looks for, and the five specific changes to make today.

Why GPTBot dominates AI crawler traffic

The 81% share is not a fluke of one month. Duda's year-over-year numbers show total LLM referral traffic up 72.7%, ChatGPT referrals specifically up 66.7%, and Claude's referrals up 23x off a small base. ChatGPT is the largest AI destination and it is growing the crawler fleet that feeds it.

There is also a structural reason. GPTBot does two jobs: it crawls the web for ChatGPT's training and retrieval index, and it fetches live pages when a ChatGPT user asks a question that the model wants to ground in current web content. That second behavior, often called "user-fetch" or query-time retrieval, is a big fraction of the total and happens whether or not you rank in traditional search. Every time a ChatGPT user asks a question on your topic, there is a real chance GPTBot is about to request a page on your site. If it cannot, someone else's page gets cited.

What GPTBot looks for on your site

Most GPTBot fetches succeed or fail on four signals. The Duda data quantifies three of them directly.

Access: GPTBot checks your robots.txt before fetching, and it honors explicit directives for its user-agent. If your robots.txt disallows it, or if your WAF blocks it at the network layer before it can read robots.txt at all, you are out. Fifty-nine percent of sites in the Duda study received at least one AI crawler visit, which means 41% received zero. Many of those zeros are WAF or robots.txt configuration problems, not traffic problems.

Content depth: the single biggest multiplier in the Duda data is blog content. Sites with 50 or more blog posts averaged 1,373.7 AI crawler visits compared to 41.6 visits for sites with no blog. That is a 33x gap. GPTBot and the models behind it extract short fragments to cite, and more pages means more fragments.

Structured entity signals: Duda found 92.8% crawl rate for sites with Google Business Profile sync versus 58.9% without, and 97.1% for sites with Yext integration. Both of those are ways of publishing an authoritative, machine-readable entity graph about your business. GPTBot uses that as an anchor when deciding whether to trust and cite your pages.

Third-party corroboration: sites with review integrations hit an 89.8% crawl rate and averaged 376.9 crawler visits. When a live user asks ChatGPT "is [your company] any good", the model wants a citation it can stand behind. Review platforms are the most common source of that corroboration.

The 3.2x traffic multiplier is the headline result

One number from the Duda study is worth internalizing because it is the business case for everything else in this post. Sites that allowed AI crawling averaged 527.7 sessions in the month, versus 164.9 for sites that did not. That is a 3.2x traffic advantage. Form completions were 4.17 versus 1.57, a 2.7x conversion advantage on top of the traffic.

This is not a forecast. This is measured behavior across 858,457 sites. If your site is in the 41% that receives zero AI crawler visits today, you are paying a 3.2x traffic tax and a 2.7x conversion tax. Fixing the access and depth signals is the most concrete ROI move in SEO right now.

Five technical changes to make today

None of the five require a dev sprint. Two require an afternoon.

1. Allow GPTBot explicitly in robots.txt

A blanket User-agent: * with no crawler-specific blocks is not enough. Add an explicit block:


User-agent: GPTBot
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: ChatGPT-User
Allow: /

Cover all three. GPTBot is the indexing crawler, OAI-SearchBot is the user-facing search crawler, and ChatGPT-User appears when a ChatGPT user pastes a URL or the model does a targeted retrieval. Each is a distinct fetch path and you want all three. For the broader list covering Claude, Perplexity, and Google's AI crawlers, see our 2026 AI crawler user-agent reference.

2. Allowlist GPTBot at your WAF

Cloudflare, AWS WAF, and similar services often challenge or block unfamiliar user agents before robots.txt even loads. A 403 or CAPTCHA at the WAF layer looks, from the crawler's side, like an unreachable site. In Cloudflare, add GPTBot and OAI-SearchBot to the list of verified bots you allow. In AWS WAF, add an explicit allow rule for the published GPTBot IP ranges at https://openai.com/gptbot.json.

3. Publish Organization schema with sameAs

GPTBot wants to know who you are as an entity, not just what your pages say. At minimum, put a JSON-LD Organization block on your homepage with:

name
url
logo
sameAs linking to your LinkedIn, Crunchbase, Wikidata, and Google Business Profile

The sameAs array is how the model links your site to the external entity graph. For more detail on schema patterns AI engines prefer, see Schema.org for AI engines.

4. Ship or expand a blog

The 33x content-depth multiplier from Duda is not a nice-to-have, it is the largest single effect in the dataset. If you have no blog, publish one. If you have fewer than 10 posts, put that on the quarter's roadmap. The posts should be short, factual, and easy to extract from. Dense marketing prose is not what models cite; specific answers to specific questions are. Our post on the anatomy of a cited blog post walks through the structural patterns that actually get pulled.

5. Add third-party review and listing signals

Sync your Google Business Profile. Claim your listings on the two or three review platforms that matter in your vertical (G2, Capterra, Clutch, Trustpilot, or industry-specific equivalents). Link them from your site, and embed schema Review and AggregateRating markup on your product and pricing pages. This is the corroboration layer GPTBot leans on when a model is deciding whether to cite you in a trust-sensitive answer.

Verify, then scan

Once you have shipped the five changes, check your server logs in the following 24 to 72 hours. You should see GPTBot fetches appear. If you do not, something is still blocking the crawler: usually the WAF, sometimes a misordered robots.txt block, occasionally a case-sensitive user-agent rule upstream.

If you would rather not diff robots.txt and WAF rules by hand, Citevera audits each of the five signals above and tells you which are failing and which are passing. We run the same access and schema checks on every site, grade them against the Duda benchmark findings, and generate a fix list ordered by impact.

Scan your site for GPTBot readiness in 60 seconds.

Frequently asked questions about GPTBot

Does GPTBot use the same rules as Googlebot?

No. GPTBot is a distinct crawler with its own user-agent string and its own rules. A robots.txt block that allows Googlebot does not automatically allow GPTBot. You have to name it.

Will allowing GPTBot hurt my SEO?

No. GPTBot does not affect your Google rankings. The two are independent systems. Disallowing GPTBot does not help your Google position; it just removes you from ChatGPT's citation pool.

What if I only want GPTBot to crawl some pages?

Use a targeted Disallow inside the GPTBot block: Disallow: /admin/, for example. This keeps staging, internal docs, or gated content out of the AI training pool while still allowing your marketing site through.

How often does GPTBot recrawl?

For most sites, every few days to a few weeks for the indexing crawler, and within seconds to minutes for user-initiated fetches through OAI-SearchBot or ChatGPT-User. If you publish a new page, expect it to appear in ChatGPT answers within days once access is clean.