The 33x Rule: Why Sites With 50+ Blog Posts Dominate AI Search
Content depth is the single strongest predictor of AI crawler attention. Sites with 50+ blog posts average 1,373.7 AI crawler visits vs 41.6 for sites with no blog. Here is why depth wins, what qualifies as depth, and how to audit your own gap.
The single biggest multiplier in the Duda data
In February 2026, the website platform Duda published a study of 858,457 sites that tracked 68.9 million AI crawler visits across them. Roger Montti covered it at Search Engine Journal. The paper reports dozens of correlations between on-site signals and AI crawler attention. Schema adoption. Google Business Profile sync. Review platform integrations. llms.txt coverage.
One number dwarfs the rest.
Sites with 50 or more blog posts averaged 1,373.7 AI crawler visits in the month. Sites with no blog averaged 41.6. That is a 33x gap. It is the largest single effect in the entire dataset, and it is not close.
If you take only one signal from the Duda study and act on it, it should be this one.
Why depth matters more than quality
The instinct when you see a content gap is to write better posts. That is the wrong lever. AI engines do not grade articles end-to-end the way a human reader does. They break your site into extractable fragments and cite the ones that match the query they are answering.
More pages means more fragments. More fragments means more chances to be the source the model pulls when a user asks about your category. A site with 50 above-average posts gets cited more often than a site with 5 exceptional posts, because the 5-post site simply does not have coverage across the question surface that buyers ask.
This is the mechanic behind Dan Taylor's observation, covered in Great Content Is No Longer Enough, that exceptional content in isolation underperforms. Isolation is the word that matters. A great article surrounded by 49 other competent articles on the same theme is not the same asset as a great article standing alone. The depth is what turns individual pages into a citation surface.
What the Duda numbers actually say
Four numbers from the study, worth internalizing:
- Sites with 0 blog posts: 41.6 average AI crawler visits.
- Sites with 1-9 posts (thin): not called out directly in the coverage, but sits well below the deep-site average by interpolation.
- Sites with 10-49 posts (growing): meaningful uplift over zero-blog sites, still well below the 50+ cohort.
- Sites with 50+ posts (deep): 1,373.7 average AI crawler visits, a 33x multiplier.
The dataset spans 858,457 sites, so the effect is not a small-sample artifact. Crawler attention scales roughly with content surface, with a sharp inflection somewhere around the 50-article mark that separates "has a blog" from "is a content-driven site."
There is a second number from the same study that explains why crawler attention matters, not just as a vanity metric. Sites that received any AI crawler traffic averaged 527.7 sessions in the month, versus 164.9 for sites that did not. That is a 3.2x traffic multiplier on top of the 33x crawler multiplier. Form completions were 4.17 versus 1.57, a 2.7x conversion lift. Crawler attention compounds into user-facing traffic and conversions. It is the leading indicator.
What qualifies as AI-friendly content depth
Hitting 50 posts is necessary but not sufficient. The 50+ cohort in the Duda study presumably contains some sites that built a shallow content mill and some that built a genuine authority surface. The second group does better. Three qualifiers:
Structural clarity. Posts should have a clear H2/H3 hierarchy, short paragraphs, and explicit questions as headings where natural. The model is looking for extractable fragments; structure signals where they are before the words are parsed. We unpack the specifics in the anatomy of a cited blog post.
Entity density. Each post should reference specific named entities - the actual companies, tools, authors, products, or places it discusses. Generic nouns ("users", "the industry", "platforms") produce fewer citations than named references because the model cannot resolve them to an entity graph.
Attribution. When you state a number, cite the source. "32% of top results lose clicks to AI Overviews" without a source does not register as a fact the model will carry forward. "32% of top results lose clicks to AI Overviews (Search Engine Journal, 2026)" does. Attribution converts assertion into quotable fact.
Depth plus these three qualifiers compounds. 50 posts that get each right is the shape of a site that AI engines want to cite.
How to audit your own depth
Citevera's audit now measures content depth directly. The scanner parses your sitemap (including sitemap-index structures that many WordPress and Shopify installs publish), counts article URLs under conventional content-hub paths (/blog, /articles, /posts, /insights, /resources, /news, /stories, /guides), and categorizes your site into one of four buckets:
- None (0 articles): no blog content detected. High-severity audit item.
- Thin (1-9 articles): medium-severity item. You have a start but are orders of magnitude below the citation threshold.
- Growing (10-49 articles): low-severity item. Progress, but the 33x inflection is still ahead of you.
- Deep (50+ articles): no audit item emitted. You cross the threshold where AI crawlers start to pay sustained attention.
If your sitemap is a sitemap-index (common on WordPress multisite and on sites with more than 50,000 URLs), Citevera recurses into child sitemaps to reach the actual article URLs. Sites that previously got a misleading "thin" read because their top-level sitemap was just an index of indexes now get the accurate "deep" bucket.
The fix is editorial, not technical
If your audit says none, thin, or growing, the fix is not a plugin to install. It is a content roadmap. Three heuristics:
1. Publish to a question. Each post should answer a specific query someone would type into ChatGPT or Perplexity. "Best [category] for [audience]" and "How do I [task]" queries produce the highest citation rates because the model is actively looking for a ranked answer it can quote.
2. Ship one per week. At one post per week, you cross the 50-post threshold in a year. At two per week, six months. The Duda study measured state, not velocity - but velocity determines when you cross the inflection.
3. Refresh the top 20% quarterly. Content freshness is a separate Duda finding. Old posts lose citation weight over time, so the depth you built last year needs maintenance. See our 30-day rule for content freshness for the specific pattern.
CTA
Scan your site for content depth and the other 34 AI search signals.
The audit tells you your current bucket, how many articles we counted, and what the fix list looks like ranked by impact. Free, 60 seconds, no signup for the headline numbers.
Frequently asked questions about content depth
Does article length matter as much as count?
Yes, but count dominates. A 600-word post that directly answers a specific question is more citable than a 2,500-word essay that buries the answer. Aim for posts that are long enough to be self-contained and short enough that every paragraph earns its place. Quantity with basic quality beats scarcity with polish.
What about documentation pages? Do those count?
They count if they live under a content-hub path the scanner can identify. Docs under /docs are picked up by the audit. Docs buried under /app/help/[id] or gated behind a login are not. Good documentation is citable content; the question is whether the AI crawler can reach it.
What if I have 50+ posts but they are all AI-generated filler?
The Duda correlation is count-based, so the crawler visits still register. The conversion effect downstream depends on whether the posts are worth citing. A content mill shows up in the "deep" bucket in Citevera's audit, but the follow-on traffic tends to be weaker than an intentional content program. The measurement is a floor, not a ceiling.
How does this interact with the GPTBot access layer?
They compound. No amount of content depth matters if GPTBot cannot reach your site; see Why 81% of your AI traffic comes from ChatGPT for the access layer. With access open and depth built, the two signals multiply. With either one missing, the other is wasted.
