AI Search · Framework · Published 4 June 2026 · 18 min read

The 120-Point AI Readiness Framework: How I Actually Audit Sites for AI Search

Most SEO audits were built to answer one question: can Google rank this site? It's no longer sufficient. This is the framework I assembled to close the gap: a 120-point rubric across six categories, and the same process I run on every client site before I make a single recommendation.

Most traditional SEO audits were designed to answer one question: can Google rank this site? It's a good question, and for two decades it was very nearly the only one that mattered. But it is no longer sufficient. A page can rank perfectly well in the ten blue links of Google or Bing and still be entirely absent from the answers ChatGPT or Claude serve up, the summary that sits above Google's results, or the recommendation Perplexity reads aloud. The mechanics of being found and the mechanics of being cited have diverged, and the audit tools most agencies rely on have not caught up. I have spent a lot of time in recent months on this, it's needing an update several times a week, but here is where I have got to in early June 2026 (I will endeavour to keep this post updated as it evolves). 

This is the framework I built to close that gap. It is a 120-point rubric across six categories, and it is the same process I run on every client site before I make a single recommendation. This article sets out the whole of it: what each category measures, why it is weighted the way it is, and, just as importantly, what I deliberately refuse to score. If you run audits yourself, you are welcome to take the structure and adapt it. If you would rather I ran it on your site, there is a way to do that at the end.

A note before I begin: I will be naming sources throughout, including where they contradict one another and including where I disagree with them. That is deliberate. The single most important principle in this framework is that no vendor (not Google, not OpenAI, not anyone) gets to be the final word simply because it is their product. I will come back to that at length. To be clear, I am not the master AI engineer, but being a process and strategy-driven ADHD head, I am sharing the framework I have assembled and much of it comes from far more brilliant people than myself. I will take credit for the build and utilisation and of this and how it ends-up in clients hands, delivery of recommendations is where I see most agencies fail IMHO.

The problem: SEO frameworks don't account for AI search

Traditional SEO is built around ranking. Crawl the page, index it, position it in a results list, earn the click. Every mature audit framework, and most of the tooling, optimises for that chain. It is a sound model for a world in which the search engine returns a list and the user chooses from it.

AI search does not work that way. Google's AI Overviews, AI Mode, ChatGPT Search, Perplexity and the rest do not hand the user a list to choose from. They compose an answer, and in composing it they excerpt: they pull sentences, statistics and definitions out of pages and stitch them into a response, citing some sources and silently ignoring the rest. The unit of value is no longer the ranked position; it is the cited passage.

The unit of value is no longer the ranked position; it is the cited passage.

That shift breaks several assumptions baked into conventional audits. Ranking-era tools check whether a page is indexable, but they do not check whether it is excerptable. A page can be indexed and still carry a signal that forbids it from being quoted. They score schema on whether it is present and valid, but not on whether it tells the truth about the visible page, which is precisely the thing an AI system cross-checks. They measure crawlability against robots.txt, but they do not test whether the firewall in front of the site quietly returns a 403 to AI crawlers regardless of what robots.txt says. None of these are exotic edge cases. They are the everyday reasons a perfectly competent site is invisible to AI, and a ranking audit will not surface a single one of them.

So I needed a different instrument. Not a replacement for technical SEO (the fundamentals still hold), but a framework built around the question AI search actually asks: is this page eligible to be cited, is its content worth citing, and can it be extracted cleanly when it is?

Why I built my own rubric

I did not start from a blank page. There is good open work in this area, and I want to credit it plainly, because what I have assembled is a synthesis rather than an invention. I have curated highly reputable resources and folded them into my process, with my own layer of knowledge baked in. It's an Apple Pie - I did not invent the dish, it's my recipe, hopefully one of the better ones out there!

Three open-source skills informed the foundation. The first, goog-geo (by vishalmdi), had the most rigorous scoring rubric I had seen, a clean hundred points across five categories, but it treated Google's official AI Optimization Guide as the source of truth, and I do not. The second, ultimate-seo-geo (by mykpono), had the cleanest operating structure, a sensible Audit → Plan → Execute flow, but it tried to score everything, including emerging signals I would flag as experimental rather than gate a client's grade on. The third, seo-geo (by zebbern), had the strongest platform-data table, a genuinely useful catalogue of AI crawlers and 2026 reach figures, but read more like a reference document than a working framework.

Each captured part of the picture. None, on its own, was right for client work. So I merged the best of all three, then layered in three further sources that the open skills predated or omitted: Google's own Lighthouse Agentic Browsing category (shipped 2026), the Common Crawl Foundation's training-data access framework, and Aleyda Solis's gap-diagnosis taxonomy. The result is a 120-point rubric across six categories, with explicit, defended zones for forward-looking items kept out of the score.

The reason it runs to 120 rather than 100 is that the sixth category, Agentic Browsing Readiness, only applies when Google's Lighthouse audit can be run against the site. When it cannot, the framework reverts cleanly to 100 points across five categories. Because the grade scale is percentage-based, the same letter grades apply either way; a site is never advantaged or penalised by whether that sixth category was in scope.

The truth-source posture

Before the categories themselves, one matter of principle, because it shapes every judgement the framework makes.

The framework does not defer to vendor positions, including Google's. Vendors have commercial bias. Their stated positions tend to favour behaviours that benefit their products, and that is true of Google, OpenAI, Perplexity and Anthropic alike when they describe their own crawlers. When sources conflict, I do three things: I surface the conflict in the client report, naming the sources and citing the data on each side; I apply calibrated practitioner judgement, weighting cross-platform evidence and field experience over any single vendor's claim, and where a genuinely new conflict arises that I have not yet formed a view on, I say so rather than pretend to certainty.

I will give the four corroborating cases for this posture later in the article. For now, simply read every category below knowing that where I take a position against a vendor's stated guidance, it is a considered position and not an oversight.

The six categories at a glance

Here is the whole rubric in one view.

25 ptsCategory 1

Crawl & Index Eligibility

Hard gates: crawler access, indexability, snippet eligibility.

25 ptsCategory 2

Helpful Non-Commodity Content

Whether the content is uniquely useful, not a generic summary.

18 ptsCategory 3

Organisation & Extractability

Heading hierarchy, answer-first structure, FAQ and comparison structure, query fan-out.

12 ptsCategory 4

Technical Structure & Page Experience

Semantic HTML, ARIA, title and description, Open Graph.

20 ptsCategory 5

Schema, Entity & AI Access

JSON-LD types, schema-content honesty, authorship and dates, cross-platform AI bots, llms.txt.

20 ptsCategory 6

Agentic Browsing Readiness

CLS, accessibility tree, labelled forms, WebMCP. Lighthouse-driven.

Conditional

Two things are worth noticing immediately. First, the two heaviest categories, Crawl Eligibility and Content Quality at 25 points each, are the two that genuinely determine whether a page can be cited at all. Everything else improves the odds; these two set the ceiling. Second, the categories are ordered roughly by how binding they are. The early ones are gates: fail them and content quality is irrelevant because no one will ever read it. The later ones are multipliers: they make good content more extractable and more credible, but they cannot rescue a page that the model was never allowed to see.

Let me take each in turn.

Category 1: Crawl & Index Eligibility (25 points)

These are hard gates. A page that fails any of them cannot appear in Google's AI Overviews or AI Mode regardless of how good the content is. There is no partial credit in the real world for a page the system is forbidden from reading, and the scoring reflects that.

  • robots.txt accessible2 pts
  • Googlebot not blocked in robots.txt5 pts
  • HTTP 200 response2 pts
  • No noindex signal: meta tag or X-Robots-Tag header6 pts
  • No snippet-blocking signal: nosnippet, max-snippet:0, or data-nosnippet over core content5 pts
  • Canonical set and self-referencing2 pts
  • Informational items reported: Google-Extended status and similarnot scored

The two highest individual weights in this category, noindex at six points and the snippet-blocking signals at five, are there because they are complete-invisibility signals. It is worth understanding why nosnippet in particular is so damaging in an AI context. AI Overviews work by excerpting. A page marked nosnippet cannot be excerpted even when it is fully indexed and ranking. You have, in effect, granted permission to be listed but forbidden the one operation the AI answer depends on. I have seen this exact configuration on sites whose owners were mystified as to why they never appeared in AI summaries despite ranking on page one. The snippet directive, often applied years ago for an unrelated reason, was quietly excluding them the entire time.

The reason Google-Extended sits in the informational column rather than the scored one is covered later under the calibrated positions; in short, it governs training and grounding rather than AI Overviews eligibility, so blocking it is a legitimate business decision rather than an audit failure.

Category 2: Helpful Non-Commodity Content (25 points)

This is the only category scored by human judgement rather than DOM extraction, and it is the most important in the rubric. Over any meaningful time horizon, content quality is the binding constraint on AI citation. You can fix every technical gate on this list and still never be cited, because the model had nothing worth quoting.

The method is deliberately simple and deliberately strict. I read the first 800 words of the page from a real browser snapshot (not a curl of the raw HTML, because so much is injected by JavaScript), and score against four questions:

  • Does it show a unique perspective or first-hand expertise?8 pts
  • Does it satisfy the visitor's intent without leaving them to search again?7 pts
  • Does it go beyond commodity: specific details, examples, trade-offs, failure modes?6 pts
  • If AI-generated content is present, is the quality acceptable? Full marks by default where there are no AI tells.4 pts

I score this category conservatively, and I would encourage anyone adapting the framework to do the same. A page that "covers the topic" without offering anything a reader could not get from a generic summary scores low, by design. Full marks are reserved for content that clearly could not be replaced by an AI's own paraphrase of the open web, because that is precisely the content an AI has a reason to cite rather than absorb and discard.

This is uncomfortable for a lot of sites, including some that perform well in conventional SEO. Thin-but-optimised content has been a viable ranking strategy for years. It is not a viable citation strategy. The model already knows the commodity facts; what it does not have, and what it will reach out to a source for, is the specific number, the named trade-off, the failure mode someone learned the hard way.

Category 3: Content Organisation & Extractability (18 points)

If Category 2 asks whether the content is worth citing, Category 3 asks whether it can be extracted cleanly when it is. Good substance buried in poor structure gets passed over for a competitor who said something less insightful but said it where the model could lift it.

  • Single H1 matching query intent3 pts
  • Logical heading hierarchy2 pts
  • Direct answer in the first paragraph, 80 words or fewer4 pts
  • FAQ or structured question-and-answer block3 pts
  • Tables or ordered lists1 pt
  • At least three internal links2 pts
  • Query fan-out coverage: related sub-topics appearing naturally3 pts

The four-point line, a direct answer in the first paragraph, carries the most weight here for a concrete, evidenced reason. Analysis of citation placement in 2025 found that 44.2% of all AI citations are drawn from the first 30% of a page's content, with 31.1% from the middle and just 24.7% from the final third. The implication is blunt: lead with the answer, not the context. The marketing throat-clearing that opens so many articles ("in today's fast-moving environment…") is not merely weak writing, it actively positions your answer outside the zone the model reads most. Top-loading is, in my experience, the single highest-leverage writing change most sites can make.

Category 4: Technical Structure & Page Experience (12 points)

This category is lighter for a reason: these are hygiene factors. They rarely make a page citable on their own, but their absence introduces friction for the systems trying to parse the page, and that friction compounds.

  • <main> element present2 pts
  • <article> or semantic sectioning2 pts
  • <title> of 50–60 characters plus a meta description3 pts
  • All images carry non-empty alt text2 pts
  • Interactive elements have ARIA labels1 pt
  • Open Graph tags present2 pts

Semantic HTML is doing quiet work here. When the primary content is wrapped in <main> and self-contained sections sit in <article>, an extraction system can identify the substance and discard the navigation, the cookie banner and the footer with far greater confidence. It is the difference between handing someone a document with the body clearly marked and handing them a page of undifferentiated text and asking them to guess which part is the point.

Category 5: Schema, Entity & AI Access (20 points)

This category does the most work in separating credible sites from the rest, and it carries one of my firmer positions: schema is treated here as a hard requirement, not a nice-to-have. I will defend that against Google's stated guidance shortly. The short version is that sites lacking structured data almost invariably show systemic content-discipline gaps elsewhere, regardless of what any specific platform vendor says it formally requires.

  • JSON-LD present, parseable and meaningful: subject to the schema honesty principle; schema that contradicts visible content scores zero4 pts
  • High-value schema types matching the page type3 pts
  • Author attribution visible, with credentials3 pts
  • Publication or "last updated" date visible3 pts
  • Statistics with source attribution2 pts
  • Cross-platform AI bots not blocked in robots.txt, including CCBot3 pts
  • llms.txt present and well-formed2 pts

The schema honesty principle is worth dwelling on because it inverts how most tools score structured data. Conventional auditors reward schema for being present and valid. I reward it for being true. If Article schema declares an author of "Jane Smith" but no byline is visible anywhere on the rendered page, that schema does not earn its four points; it earns a liability flag, and I recommend removing the misleading fields. The same applies to Product schema with prices the page does not show, AggregateRating with star counts absent from the DOM, or Review schema naming reviewers invented for the markup. AI systems increasingly cross-check declared structured data against visible content at retrieval time, and a mismatch does not merely fail to help; it gets the page de-prioritised.

Schema implemented as a trick is worse than no schema at all.

For reference, the required schema types by page class are as follows:

Page typeRequired schema
Blog post / article / newsArticle / BlogPosting / NewsArticle with author and dates
FAQ sectionFAQPage
How-toHowTo
ProductProduct with offers
Local businessLocalBusiness
Organisation rootOrganization with sameAs
Author bioPerson with credentials and sameAs

The inclusion of CCBot in the AI-bot access check matters more than its three points suggest, and it connects to a problem most audits never test for. CCBot is Common Crawl's crawler, and the Common Crawl archive is a primary training source for most major language models. Block CCBot and you are not merely missing from one platform; you risk being absent from the training corpus that feeds nearly all of them. More on that in the discussion of the "invisible 403" below.

Category 6: Agentic Browsing Readiness (20 points, conditional)

This category runs only when Google's Lighthouse Agentic Browsing audit is available for the site, which in practice means sites with transactional flows worth measuring: e-commerce, booking, SaaS with forms. It measures something genuinely distinct from everything above it: not whether an AI search engine will cite the page, but whether an AI agent can navigate, label and interact with it.

  • CLS within threshold: 0.1 or below is Good5 pts
  • Accessibility tree integrity4 pts
  • Interactive elements properly labelled: a deeper check than Category 44 pts
  • Forms agent-navigable: programmatic labels, no JavaScript-only submission4 pts
  • WebMCP tools registered3 pts

The distinction this category draws is the one I expect to matter most over the next few years. AI citation answers the question "will an AI mention you when answering a query?" Agentic browsing answers a different question: "can an AI agent actually book a flight on your site, sign up for your trial, or submit your contact form?" As agentic products move from demonstration to production (ChatGPT's browse mode, Claude's computer use, and the rest), the second question stops being hypothetical.

The WebMCP line carries a deliberate piece of framing. Its presence is rewarded with three points; its absence is framed as an "emerging-specification opportunity, not a deficiency", and never scored as a failure. Adoption of WebMCP in 2026 is close to zero. To penalise its absence would be to mark virtually every site on the web as failing for not having implemented an unproven standard, which would tell a client nothing useful. This is an instance of a broader discipline that I think is the most distinctive thing about the framework, so let me address it directly.

What I deliberately keep out of scoring

There are three things that sit inside my process as forward-looking guidance and are deliberately not scored. I want to explain why, because the restraint is the point.

The first is Site-as-MCP: the emerging practice of a site declaring agent-callable tools through Model Context Protocol discovery hooks. The second is EntityMap: a proposed standard for publishing an entitymap.json at the site root that declares the site's entities and their relationships. The third is Common Crawl edge-access testing: the "invisible 403" diagnostic, which I run as a qualitative section for publisher and top-tier sites but never as a universal line item.

Each of these is treated the same way in my process: trigger criteria for when to recommend it, an explicit statement of what it solves and what it does not, the auditor checks to verify its presence or absence, paste-in recommendation language for the client, and, crucially, an honesty flag that surfaces the experimental status and the absence of platform-side evidence.

The principle

Things with no confirmed consumer do not go in the score, even when they are plausible and even when I recommend them. Neither Site-as-MCP nor EntityMap has a single confirmed AI-platform consumer today. Both are plausibly valuable. Both are cheap to implement. I do recommend them to the right clients as first-mover bets. But to fold them into a client's grade would be to penalise every site that has not adopted an unproven standard, and that is neither fair nor honest.

This restraint cuts against the grain of a lot of AI-search commentary, which tends either to ignore emerging standards entirely or to pitch them as the new must-have. The honest middle ground (recommend cautiously, label the experimental status plainly, and watch the server logs) is the one I have found lands best with the kind of client who has been burned before by consultants selling hype.

The four cases for not deferring to vendors

I promised earlier to defend the truth-source posture, because it is the spine of the whole framework. Here are the four cases that, taken together, convince me that practitioner judgement must sit above vendor deference.

Case one: Google contradicts itself on llms.txt

Google's AI Optimization Guide states plainly that llms.txt has "no effect on AI Search." Google's own Lighthouse Agentic Browsing category, shipped roughly eighteen months later, audits the presence of llms.txt as a positive discoverability signal. Same vendor, opposite positions, in two of their own documents. This is the single cleanest piece of evidence I have for the argument: if even Google's documentation cannot agree with itself, no single vendor statement can be treated as settled fact.

Case two: Common Crawl's edge-block research shows robots.txt is incomplete

The Common Crawl Foundation's field guide documents what every honest practitioner eventually discovers: that a large share of top properties block AI bots at the CDN or WAF layer even when robots.txt is entirely clean. Vendors describe robots.txt as the access mechanism. The empirical reality is that the edge layer silently overrides it. Common Crawl is the most credible source on this precisely because they run the bot that gets blocked.

Case three: Aleyda Solis's anti-overclaim discipline

Her Impact Layer framework explicitly rejects blending observed AI traffic with modelled and proxy signals into one tidy "AI visibility" number, which is exactly what most vendor visibility tools do. The honest discipline is not "trust the score"; it is "separate the layers, label confidence, document assumptions."

Case four: Princeton's GEO research

The academic work behind the citability mechanics (self-contained answer blocks, top-loaded answers, named statistics with sources) is independent of any vendor's commercial position. I use it as the academic anchor for the rubric's content-extraction checks precisely because it has no product to sell.

Notice that three of these four cases are not about Google at all. The argument is not anti-Google; it is anti-deference. When all four sources align on a position, that position is durable and I treat it as such. When they diverge, I surface the conflict transparently rather than pick a side and present it to a client as fact. I never quote a vendor's claim as settled in a deliverable; I attribute it ("Google states X…") and present the framework's stance alongside.

How I use it: Audit, Plan, Execute

A rubric is only as good as the process it sits inside. Mine runs in three modes.

Audit produces the scored report: every check evidenced, a gap diagnosis, and a citation demonstration: a genuine before-and-after rewrite of one of the client's own passages, so they can see what extractable content looks like rather than take my word for it.

Plan turns findings into a four-phase roadmap: Foundation, Expansion, Scale, Authority. But the roadmap opens with a step that I think matters more than the phasing: gap diagnosis. Every finding is tagged against one of nine characteristics drawn from Aleyda Solis's framework: Accessible, Extractable, Useful, Differentiated, Credible, Recognisable, Consistent, Corroborated, Transactable. The phasing tells the client when the work happens. The gap diagnosis tells them what kind of work this really is. A typical report reads less like a flat list and more like this: "Of your eighteen findings, six are Accessibility problems (fix the technical crawl layer); eight are Extractability problems (restructure and rewrite pages for AI extraction); three are Credibility problems (add authorship, sourcing and dates); one is Transactability (a Lighthouse-flagged form-labelling issue). The Accessibility group is the highest-leverage and the fastest, so we lead there." That framing names what the client is actually buying: sometimes a technical engagement, sometimes a content engagement, sometimes a brand engagement.

Execute ships the fixes. Safe, self-contained changes (meta tags, alt text, content, JSON-LD, Open Graph, llms.txt) go in directly. Higher-impact changes (robots.txt, canonicals, redirect maps, noindex removal, hreflang, CMS template changes) are described in plain language and confirmed before any code is written, because those are the changes that can do real damage if a misunderstanding has crept in.

Findings can be pushed to a client's Trello board as cards, one card per finding rather than per page, labelled by severity, time-to-fix, effort and category. One deliberate omission: I never use "Risk" as a label. "Critical Issue, Low Risk" reads as a contradiction to a client, conflating the severity of a problem with the safety of its fix. The same information is carried by Severity, Time to Fix and Effort, without the muddle.

A word on honest expectations

I will close the substance with the thing I am most careful to say to every client, because it is the thing the market most often gets wrong.

AI search is not replacing SEO. Across the same datasets that produce the citation figures above, organic search remains roughly 108 times larger than AI in measured visits. AI is influential, particularly in the discovery and evaluation moments where someone is weighing options rather than navigating to a brand they already know, but the raw click volume is not there yet, and anyone telling you "AI is the new SEO" is selling you something. The honest framing is narrower and more useful: AI is the discovery layer for high-intent decisions, it is growing very quickly, and on that layer being cited matters more than being clicked. That is worth getting ahead of. It is not worth abandoning everything that still drives the other 99% of your traffic.

That same discipline runs through how I report. I do not blend observed traffic, proxy signals and modelled estimates into a single "AI visibility score", because each of those means something different and blending them is overclaim. I report what I can observe as observed, label what is modelled as modelled, and never tell a client "AI search drove £X" when X was inferred from proxies. If that makes my reports look less impressive than a competitor's single confident number, so be it. The number that is honest is the one worth having.

Run it on your site

I'll tell you honestly where you stand, including the parts a conventional audit would never find.

The full scored audit, the gap diagnosis, a before-and-after rewrite of your own content, and a phased plan to close the gaps. A page must be eligible to be cited before content quality matters at all. Once it is eligible, the content has to be worth quoting and structured so it can be lifted cleanly. That's exactly what Local SEO Ltd does.

Request an audit
No pitch, no agenda. Stephen reads everything himself.