How it works - Citevera

Inside a scan

The four-stage pipeline.

Every audit runs the same sequence. Total wall time per scan is 30 to 90 seconds.

1
Crawl
We fetch robots.txt, sitemap.xml, llms.txt, and the homepage, then pick up to 9 additional internal pages (product, pricing, docs, blog, about, FAQ, how-to) based on your sitemap structure. Each page is rendered with headless Chromium at a 10-second timeout and 5MB response cap.
2
Classify
A fast classifier pass identifies each page type and extracts structured facts - visible CTAs, FAQ blocks, price points, stats with sources, and JSON-LD types. Prompt cached, budget under $0.03 per scan.
3
Audit
Our audit model runs the full 35-point AI search readiness checklist against the classified page data. System prompt is the entire ai-seo skill (prompt cached) covering AEO, GEO, and crawlability patterns that currently get cited by AI answer engines. Output is a strict JSON report with a score, axis scores, audit items, generated fix artifacts, and a priority plan.
4
Render
The report is validated against a strict Zod schema (schema-violation reports are retried, not returned), typography is normalized, and the result is stored and rendered on scan.citevera.com/r/[id]. The rendered report shows a 5-stage AI decision funnel (detection, understanding, trust, coverage, conversion) derived from the audit items, the top 3 moves ranked by leverage, a 15-item priority plan, and per-axis breakdowns. Paid reports also get a PDF export.

What we audit

And why each item matters.

Each check maps to a measured signal AI engines use to decide whether to cite your site. The numbers come from independent studies, not our own claims.

AI crawler access: GPTBot, ClaudeBot, PerplexityBot, Gemini
GPTBot is 81% of AI crawler traffic, ClaudeBot 16.6%, PerplexityBot 1.8%, Gemini 0.6% (Duda, Feb 2026). We verify each is explicitly allowed in robots.txt and reachable through your WAF. Blocked access is the most common reason a site is not cited.
Read: Why 81% of your AI traffic comes from ChatGPT
llms.txt and llms-full.txt
We check for the presence and correctness of llms.txt because GPTBot and ClaudeBot respect it as a hint for structured summarization. If you do not have one, we generate it.
Read: How to generate llms.txt
Schema.org structure and the entity graph
AI engines extract structured entities, not prose. We audit Organization, Person, FAQPage, HowTo, Article, and Product JSON-LD for presence, completeness, and sameAs linkage to LinkedIn, Crunchbase, Wikidata, and review platforms.
Read: Schema.org for AI engines
Content depth (the 33x rule)
Sites with 50 or more blog posts average 1,373.7 AI crawler visits vs 41.6 with no blog - a 33x gap (Duda, Feb 2026). We categorize every site into none / thin (1-9) / growing (10-49) / deep (50+) buckets with recursive sitemap parsing for large content surfaces.
Read: The anatomy of a cited blog post
Google Business Profile sync
Sites with GBP sync see a 92.8% AI crawler rate versus 58.9% without (Duda, Feb 2026). We detect GBP signals via Maps embeds, outbound business.google.com links, and schema.org LocalBusiness.hasMap / Organization.sameAs references.
Read: Great content is no longer enough
Third-party review platform presence
Sites with review-platform integrations averaged 89.8% crawl rate (Duda, Feb 2026). We scan for G2, Capterra, Trustpilot, Clutch, Yelp, BBB, Gartner, Forrester, Product Hunt, and TrustRadius via outbound links, embedded widgets, and schema.org AggregateRating URLs.
Read: The anatomy of a cited blog post
Citation readiness score
Per Dan Taylor, extractability matters more than eloquence for AI citation. We score each page 0-100 on paragraph shape, attributed numerical claims, named entity density, and question-style H2/H3 headings so you can see exactly which axis of extractability is weakest.
Read: Great content is no longer enough

Time ceiling + rate limits

Predictable on every scan.

Per-page fetch

10s timeout, 5MB max, 10 pages max, 60s total wall time.

Free tier

3 scans per IP / 24h, 1 per email / 24h, 10 per domain / 7d.

Validation

Zod-strict report schema. Malformed reports retry, never return.

Run a free audit

The four-stage pipeline.

Crawl

Classify

Audit

Render

And why each item matters.

AI crawler access: GPTBot, ClaudeBot, PerplexityBot, Gemini

llms.txt and llms-full.txt

Schema.org structure and the entity graph

Content depth (the 33x rule)

Google Business Profile sync

Third-party review platform presence

Citation readiness score

Predictable on every scan.