Inside a scan

The four-stage pipeline.

Every audit runs the same sequence. Total wall time per scan is 30 to 90 seconds.

  1. 1

    Crawl

    We fetch robots.txt, sitemap.xml, llms.txt, and the homepage, then pick up to 9 additional internal pages (product, pricing, docs, blog, about, FAQ, how-to) based on your sitemap structure. Each page is rendered with headless Chromium at a 10-second timeout and 5MB response cap.

  2. 2

    Classify

    A fast classifier pass identifies each page type and extracts structured facts - visible CTAs, FAQ blocks, price points, stats with sources, and JSON-LD types. Prompt cached, budget under $0.03 per scan.

  3. 3

    Audit

    Our audit model runs the full 35-point AI search readiness checklist against the classified page data. System prompt is the entire ai-seo skill (prompt cached) covering AEO, GEO, and crawlability patterns that currently get cited by AI answer engines. Output is a strict JSON report with a score, axis scores, audit items, generated fix artifacts, and a priority plan.

  4. 4

    Render

    The report is validated against a strict Zod schema (schema-violation reports are retried, not returned), typography is normalized, and the result is stored and rendered on scan.citevera.com/r/[id]. The rendered report shows a 5-stage AI decision funnel (detection, understanding, trust, coverage, conversion) derived from the audit items, the top 3 moves ranked by leverage, a 15-item priority plan, and per-axis breakdowns. Paid reports also get a PDF export.

What we audit

And why each item matters.

Each check maps to a measured signal AI engines use to decide whether to cite your site. The numbers come from independent studies, not our own claims.

  • AI crawler access: GPTBot, ClaudeBot, PerplexityBot, Gemini

    GPTBot is 81% of AI crawler traffic, ClaudeBot 16.6%, PerplexityBot 1.8%, Gemini 0.6% (Duda, Feb 2026). We verify each is explicitly allowed in robots.txt and reachable through your WAF. Blocked access is the most common reason a site is not cited.

    Read: Why 81% of your AI traffic comes from ChatGPT
  • llms.txt and llms-full.txt

    We check for the presence and correctness of llms.txt because GPTBot and ClaudeBot respect it as a hint for structured summarization. If you do not have one, we generate it.

    Read: How to generate llms.txt
  • Schema.org structure and the entity graph

    AI engines extract structured entities, not prose. We audit Organization, Person, FAQPage, HowTo, Article, and Product JSON-LD for presence, completeness, and sameAs linkage to LinkedIn, Crunchbase, Wikidata, and review platforms.

    Read: Schema.org for AI engines
  • Content depth (the 33x rule)

    Sites with 50 or more blog posts average 1,373.7 AI crawler visits vs 41.6 with no blog - a 33x gap (Duda, Feb 2026). We categorize every site into none / thin (1-9) / growing (10-49) / deep (50+) buckets with recursive sitemap parsing for large content surfaces.

    Read: The anatomy of a cited blog post
  • Google Business Profile sync

    Sites with GBP sync see a 92.8% AI crawler rate versus 58.9% without (Duda, Feb 2026). We detect GBP signals via Maps embeds, outbound business.google.com links, and schema.org LocalBusiness.hasMap / Organization.sameAs references.

    Read: Great content is no longer enough
  • Third-party review platform presence

    Sites with review-platform integrations averaged 89.8% crawl rate (Duda, Feb 2026). We scan for G2, Capterra, Trustpilot, Clutch, Yelp, BBB, Gartner, Forrester, Product Hunt, and TrustRadius via outbound links, embedded widgets, and schema.org AggregateRating URLs.

    Read: The anatomy of a cited blog post
  • Citation readiness score

    Per Dan Taylor, extractability matters more than eloquence for AI citation. We score each page 0-100 on paragraph shape, attributed numerical claims, named entity density, and question-style H2/H3 headings so you can see exactly which axis of extractability is weakest.

    Read: Great content is no longer enough
Time ceiling + rate limits

Predictable on every scan.

Per-page fetch
10s timeout, 5MB max, 10 pages max, 60s total wall time.
Free tier
3 scans per IP / 24h, 1 per email / 24h, 10 per domain / 7d.
Validation
Zod-strict report schema. Malformed reports retry, never return.