Changelog
User-visible changes to the AI Readiness Check, newest first.
1.4
- Sitemap discovery now follows the spec: read Sitemap: directives from robots.txt and try those URLs first before falling back to /sitemap.xml. Sites that publish sitemap_index.xml or use locale-prefixed paths (a common real-world pattern) are now correctly detected as having a valid sitemap.
- Reject soft 404s in every origin-scoped file check (robots.txt, ai.txt, llms.txt, llms-full.txt, agent-card.json, mcp.json, tdmrep.json). Many sites — especially single-page apps — return HTTP 200 with the SPA HTML shell at any unknown path, so the previous "file exists" check was getting fooled. Each checker now requires a non-HTML response that also carries the structural markers of the file (User-Agent: lines, an H1, valid JSON, etc.) before counting it as present. A site that previously got partial credit for "file exists but malformed" will now correctly report "file not found."
- Response details panel on each result card now consistently shows the URL the scanner actually fetched, the HTTP status, the content-type, and check-specific structural facts (H1 text for llms.txt, agent name and skill count for agent-card.json, server name and tool count for mcp.json, etc.) instead of just a sizeBytes field. Makes it possible to spot misconfigurations (e.g. served as text/html when it should be application/json) and verify the verdict yourself.
- Top-line score is now a whole number (e.g. "28" instead of "28.2"). Each per-check value is already an integer or half (1.5 / 3, etc.), so the rolled-up percentage's fractional digits read like an arithmetic bug — they aren't, but the precision is artifact rather than signal. Grade thresholds (A ≥ 90, B ≥ 80, C ≥ 70, D ≥ 60) are unchanged and are now applied to the rounded integer for consistency with what's displayed.
- Score system rebaselined: weights now sum to exactly 100, so the top-line score reads as the straight sum of per-check values shown on each card (no more divide-by-possible × 100 math). Two categories of checks are now "not scored" — they're shown for visibility but don't move the score in either direction: (a) niche files that almost always return info because the spec doesn't apply to most sites (ai.txt, tdmrep.json, agent-card.json, mcp.json), and (b) opt-out signals where the *absence* is what you want for any public site that wants to be discoverable: x-robots-tag, link-header, ai-hint-div, and ai-meta-tags (noai/noimageai are informally honored, not a formal standard). Penalizing for the absence of opt-outs would put a permanent ceiling on every clean public site's score. The freed weight was redistributed to the eight remaining scored checks: robots-txt, llms-txt, and schema-jsonld at 20 each; content-negotiation, md-route, markdown-link, sitemap, and llms-full-txt at 8 each. A site that passes all eight earns exactly 100.
1.3
- Added a free public MCP (Model Context Protocol) server at /api/mcp/v1. Add it to Claude Desktop, Claude Code, or Cursor and ask your agent to scan a URL — you'll get back a slim graded report with one-line fixes and copy-paste prompt links for each failing or warning check. Setup instructions and rate limits at /mcp.
- Published a /.well-known/mcp.json manifest at the site root so MCP-aware clients can discover the server automatically.
- Tool cap: the slim MCP response omits passing checks (they're not actionable for an agent and they cost tokens) and caps free-form text at 200 characters with a placeholder substitution to limit prompt-injection risk in the agent's context window.
- Result strings returned by /api/check are now capped at 200 characters and include a one-line fixSummary per check, matching what the MCP server returns. The on-page UI is unchanged.
1.2
- Added a separate Reachability indicator that measures how a single AI fetcher would experience your URL: time to first byte plus whether the response was served from an edge cache. Reachability sits alongside (not inside) the main score, since fast files and existing files are different questions.
- When the platform is detectable (WordPress, Shopify, Squarespace, Wix, Webflow, Ghost, Vercel, Netlify), the Reachability card surfaces platform-specific suggestions and links to authoritative resources for speeding things up.
- When Cloudflare is in front of the site but the HTML response isn't cached, the card calls that out specifically and links to Cloudflare's Cache Rules docs.
- Result cards now lead with a pithy headline and place the supporting context on a second, smaller line so the verdict is scannable.
- Tightened SSRF protections: every fetched URL (and every redirect hop) is now DNS-resolved and rejected if it points at private, loopback, or link-local space. Catches the case where a public hostname resolves to an internal IP, or a public URL redirects to localhost.
- Detect Cloudflare's managed challenge and Akamai bot blocks. Sites behind those now get a specific, accurate message instead of a misleading low score or generic 403.
- Raised the serverless function timeout from 10s to 30s. With 16 checks running in parallel, a site whose PHP workers or database get inundated (common on WordPress and other dynamic CMSes) could push the run past 10s and return a 504 mid-check.
- Raised the homepage fetch timeout from 5s to 10s so sites with a cold cache or slow first byte can still be checked.
1.1
- Added an awesome-ai-website-files accordion to "How this scorecard works" so the prompts are discoverable from the results page.
- Each result card now links directly to the relevant section of the AI Files post via a "Learn more" anchor.
- Dropped the security-txt check. It isn't an AI-readiness signal and was muddying the score.
- Surface a content-negotiation prompt link so /llms.txt-less sites can still bootstrap one with a single prompt.
1.0
- Initial release.