APIs.io insights catalog — draft

A tiered catalog of questions agents can pay to get answers to. Tier 0 is what the current /unlock/claim (star api-search/apis-io) already grants — raw data access. Tier 1+ are insights derived from that data: aggregations, rankings, cross-cuts, and trends. Each tier is meant to be paired with a different agent task of escalating effort.

Scale we’re sitting on (so the value per question is real):

4,444 providers (each scored on 6 facets, 0–100 composite + trend)
16,713 APIs, 42,683 schemas
3,631 tags with co-occurrence graph (Jaccard similarity, neighbors)
1,376 Spectral rulesets, 4,098 plan/pricing scaffolds, 4,098 FinOps profiles
3,685 JSON-LD contexts, 100+ canonical cross-provider capabilities
Per-tag scorecards: frequency, breadth, quality_lift, cohesion + facet assignment

Tier 0 — Raw data access (current unlock)

Already gated behind: star api-search/apis-io.

Bulk listings: /search-index.json, /apis/index.json, etc.
Individual provider / API records in JSON, YAML, and markdown
All .well-known/* agent-readiness files
Embedded source YAML (auditable, single fetch)

Value: agents get the full corpus in machine form without scraping. This is the table-stakes unlock.

Tier 1 — Pre-computed listings & directories

Cheap aggregations. Save the agent dozens of fetches.

Rows marked [MVP] ship first — these are the entry-point insights that prove the catalog has value beyond raw data.

#	Question	What the answer looks like	Source
1	[MVP] Which providers offer {capability}?	Ranked list with score, API count, tags	`canonical-capabilities.yml` + `providers/`
2	[MVP] Top 50 providers by API count	Table: provider, count, primary tags	`providers/_providers/*`
3	[MVP] Top 100 tags by frequency	Tag, count, breadth, band	`signals/_data/tags.json`
4	All providers publishing {common[]} type (Pricing / ChangeLog / Deprecation / SDK / Security policy)	List + URL	`providers/*/common[].type`
5	[MVP] All APIs with OpenAPI specs (vs Postman / AsyncAPI)	Provider, API, spec format, URL	`providers/*/api_specs[]`
6	All providers with Spectral rules	Provider, rule count, severity breakdown	`rules/_rules/*`
7	All FOCUS-aligned (FinOps) providers	Provider, billing model, meters	`finops/_finops/*`
8	All providers publishing JSON-LD contexts	Provider, class count, namespaces	`json-ld/_jsonld/*`
9	APIs by HTTP auth type (OAuth2 / API Key / JWT / mTLS)	Counts + provider list per type	`capabilities/*/source_yaml.consumes.auth`
10	All providers with public changelogs	Provider, changelog URL	`common[].type = ChangeLog`
11	All providers offering webhooks	Provider, webhook URL, events list	`common[].type = Webhooks` + capabilities
12	All providers with status pages	Provider, status URL	`common[].type = Status`
13	All providers with sandbox / test environments	Provider, sandbox URL	`common[].type` ∈ {Sandbox, Testing}
14	All AsyncAPI publishers (event-driven APIs)	Provider, channel count, broker	`asyncapi/_asyncapi/*`
15	All providers with SDKs (by language)	Provider, languages, repo URLs	`common[].type = SDK`
16	All MCP-ready capabilities (tools[] populated)	Capability, tool count, hints	`capabilities/*/tools[]`
17	All providers with developer portals	Provider, portal URL	`common[].type = DeveloperPortal`
18	Schemas by type ({Address, Money, Webhook, …})	Schema name, provider, ref count	`schemas/_schemas/*`
19	All providers with deprecation notices	Provider, deprecation URL, affected APIs	`common[].type = Deprecation`
20	OpenAPI version distribution (2.0 / 3.0 / 3.1)	Version, provider count, % share	`api_specs[].spec_type`

Suggested task to unlock Tier 1: something light — newsletter signup, follow on X, GitHub follow.

Tier 2 — Scored rankings (the rubric is the moat)

The composite score + 6 facets is a real product. Agents can’t replicate it without re-implementing the rubric.

#	Question	Source
21	Top 25 most “agent-ready” providers (composite ≥ 80)	`providers/*/score`
22	Top providers in {domain} by contract_quality	`score.facets.contract_quality`
23	Top providers by developer_ergonomics	`score.facets.developer_ergonomics`
24	Top providers by operational_transparency	`score.facets.operational_transparency`
25	Top providers by commercial_clarity (pricing/plans clarity)	`score.facets.commercial_clarity`
26	Top providers by governance	`score.facets.governance`
27	Top providers by discoverability (machine-readable surface)	`score.facets.discoverability`
28	Score band distribution across the network (minimal/thin/emerging/growth/mature)	`score.band`
29	Where does {provider} rank against the rest of its capability cohort?	Composite + percentile in capability set
30	Worst-scored providers by facet — useful for “avoid” lists	`score.facets.*` low end
31	Score distribution histogram (by band, by domain, by capability)	`score.composite` + `score.band`
32	“Most improved” providers this quarter (composite delta)	`score.delta`
33	Median score per capability cohort (benchmarks)	`score.composite` × `canonical-capabilities`
34	Facet correlation: does high contract_quality predict high governance?	`score.facets.*`
35	Best-in-class canonical example per facet (single exemplar)	`score.facets.*`
36	Provider score percentile rank — “where does {provider} sit globally?”	`score.composite`

Suggested task: medium — sign up for newsletter, share a referral, complete a profile.

Tier 3 — Cross-cutting queries (the graph is where the real money is)

Tag co-occurrence, capability overlap, competitive maps. None of this is in the per-record files — it requires the precomputed graph.

#	Question	Source
37	Which providers offer both X and Y capabilities?	`canonical-capabilities` ∩ providers
38	Tags most often paired with {tag} (with Jaccard similarity)	`signals/_data/tags.json` → `neighbors[]`
39	Find APIs similar to {API} (by capability + tag overlap)	tag neighbors + capability match
40	Capability gaps: canonical capabilities with <3 providers (market opportunities)	`canonical-capabilities.yml` ∩ providers
41	Capability saturation: capabilities with >20 providers (commoditized)	same
42	Authentication landscape in {domain} (% OAuth2 vs API Key vs JWT)	capabilities + tag facet
43	Spec format mix in {domain} (% OpenAPI vs Postman vs AsyncAPI)	`api_specs[].format` + tag facet
44	Which providers compete head-to-head with {provider}? (capability + tag overlap rank)	capabilities + tags + score
45	Tag cohesion: which tags have the most consistent meaning vs which are noisy?	`signals/_data/tags.json` → `cohesion`
46	Faceted slice: all {facet=domain} APIs for {facet=persona} with {facet=schema}	tag facet model (6-dim)
47	“Who else does what {provider} does?” (capability cohort with scores)	capabilities + score
48	Industry coverage: providers per industry tag (Healthcare, Finance, Logistics…)	tag facet = industry
49	Persona coverage: APIs targeting {persona} (Developer, Admin, Partner)	tag facet = persona
50	Tag bridges — tags that connect otherwise disjoint capability clusters	tag neighbors + graph analysis
51	Capability dependencies — which capabilities tend to co-occur within a single provider	capabilities × providers
52	HTTP verb distribution per provider (GET-heavy vs CRUD vs RPC)	`capabilities.operations[].method`
53	Endpoint depth (avg path segments) per domain — REST maturity proxy	`operations[].path`
54	API operation pattern: RESTful vs RPC vs hybrid (heuristic)	operations + paths
55	MCP tool hint distribution (readOnly / destructive / idempotent counts)	`capabilities.tools[].hints`

Suggested task: medium-high — sign up for an account, share an apis.io link with attribution in their output, write a 1-line review.

Tier 4 — Cost & operations intelligence

This is where agents directly save their operators money. Pricing/rate-limit comparisons are typically scraped or asked by humans.

#	Question	Source
56	Pricing tier comparison for {capability} (freemium / paid / enterprise)	`plans/_plans/*`
57	Free-tier-friendly APIs for {capability}	`plans.type = freemium`
58	Cheapest paid tier per million requests for {capability}	`plans.entries[].price / metric`
59	Rate-limit comparison for {domain} APIs	`rate-limits/*`
60	Which providers publish rate limits at all? (vs hidden)	`rate-limits/_rate-limits/*`
61	FOCUS-aligned providers (FinOps-ready: billing model + meters)	`finops/_finops/*`
62	Unit-economics summaries per {capability}	`finops.unit_economics[]`
63	Charge category breakdown per provider (subscription / usage / one-time)	`finops.billing_model.chargeCategories`
64	Geo-pricing differences (US vs EU vs APAC)	`plans.entries[].geo`
65	Overage pricing per provider	`plans.entries[].type=overage`
66	“Cost to run {workflow} on top-5 providers” — synthesized total-of-ownership	plans + finops + capability operations
67	Hidden-fee scan: which providers have overage clauses, geo surcharges, or per-user multipliers	`plans.entries[].userMultiplied` etc
68	Free quota leaderboard: calls/month at $0 per capability	`plans.type=freemium` + entries
69	Enterprise-tier required features per capability (what triggers a sales call)	`plans.type=enterprise.elements[]`
70	Multi-currency support per provider (which billing currencies are offered)	`finops.billing_model.billingCurrency`
71	Pricing transparency score — does the provider publish prices at all, or “contact us” wall?	`plans.entries[].price` numeric vs “TBD”/”custom”
72	“Cheapest viable stack” for {workflow} — minimum-cost provider set across required capabilities	plans + capabilities + dependency graph

Suggested task: higher — paid signup, GitHub Sponsors $1, share apis.io in a public post with link.

Tier 5 — Trends, deltas & change intelligence

Longitudinal. Requires signals/tag_history/, provider score delta/trend fields, and the (currently placeholder) diff/ pipeline.

#	Question	Source
73	Providers with biggest score gain last 30 days	`score.delta` + `score.trend`
74	Providers with biggest score loss last 30 days (risk signal)	same
75	Newly added providers (last 7 / 30 days)	`created` field
76	Tags rising (frequency growth, breadth growth)	`signals/tag_history/`
77	Tags falling (declining frequency)	same
78	Spec changes in {watch list} since last visit	`diff/` (needs pipeline)
79	Provider transitions across bands (emerging → growth → mature)	`score.band` history
80	New capabilities entering the canonical taxonomy	`canonical-capabilities.yml` diff
81	Deprecation announcements per provider	`common[].type = Deprecation`
82	“What’s new since {date}?” — fully synthesized changelog	scoring + diff + new providers
83	Breaking-change frequency per provider (how often do they make agents rewrite)	spec diff history
84	Spec velocity: updates per quarter per provider (active vs abandoned)	`modified` field history
85	New endpoint launches per provider, last 90 days	spec diff
86	Removed endpoints per provider, last 90 days (what broke)	spec diff
87	Tag entropy over time — is the taxonomy consolidating or fragmenting?	`signals/tag_history/`
88	Provider commit cadence — governance signal from source repo activity	provider source repo

Suggested task: real value exchange — newsletter signup (gives you a contact), follow on X, subscribe to apis.io feed.

Tier 6 — Synthesis, strategy, and ready-to-use agent payloads

The expensive tier. These are pre-cooked answers an agent’s operator would otherwise pay an analyst for. Also: ready-to-bind MCP bundles, which save the agent runtime work.

#	Question	Source
89	“Best stack for {persona} building {thing}” — recommended provider set across required capabilities	capabilities + score + tag facets
90	Industry benchmark report (Healthcare / Fintech / Logistics / DevTools / E-commerce)	tag facet = industry + scores
91	“Likely to deprecate” risk score per provider	score trend + governance + commercial_clarity + age
92	“Agent-ready scorecard” for an arbitrary domain or company list	scoring + custom slice
93	Spec bundle for {capability} — ready to bind into an agent runtime	canonical-capabilities + provider specs packed
94	MCP tool bundle for {capability} (tools[], hints, auth)	`capabilities/*/tools[]` packed
95	Pre-resolved JSON-LD context for {domain}	`json-ld/*` cross-merged
96	Compliance posture (which providers publish ToS / Privacy / Security / Status)	`common[].type` rollup
97	“Buy vs build” report for {capability} — top 3 providers + cost + integration effort	plans + capabilities + rules
98	Custom natural-language Q&A across the corpus — single answer, with citations (see endpoint design below)	LLM over the curated index
99	“Replace {expensive provider}” — cheaper alternatives ranked by capability overlap × cost	plans + capabilities + score
100	Migration path between competing providers — endpoint-by-endpoint mapping for {provider A} → {provider B}	capabilities + operations + schemas
101	Compliance bundle per regulation (HIPAA / GDPR / SOC2 / PCI) — pre-filtered provider set with attestation links	`common[].type=Security` + tag facets
102	Risk register for a provider list — composite of likely-deprecate + governance + commercial_clarity	score + delta + age
103	Agent runtime config bundle — MCP + JSON-LD assembled into a single drop-in package	capabilities + json-ld + rules
104	Domain-specific RAG corpus — pre-packed markdown bundle for an industry, ready to embed	tag facet=industry + provider markdown

Suggested task: highest — paid newsletter tier, conference attendance, sponsor link, contribute a provider/correction back to apis.io.

Implementation notes (for whoever builds this)

These don’t have to be 64 separate endpoints. A practical surface might be:

GET  /insights/providers?capability=…&sort=score&facet=…   # Tier 1, 2, 3
GET  /insights/tags/{tag}/neighbors                        # Tier 3
GET  /insights/pricing/{capability}                        # Tier 4
GET  /insights/trends?window=30d&metric=…                  # Tier 5
GET  /insights/bundles/{capability}.json                   # Tier 6
POST /insights/ask  { question: "…" }                      # Tier 6 — LLM over corpus

Each route checks the caller’s key (X-APIs-IO-Key) against a tier field on the key record:

{ "github_username": "...", "tier": 3, "tasks_completed": ["star_apis_io", "newsletter_signup", "follow_x"] }

Tasks → tier mapping (your call to design) — examples:

Task	Bumps key to tier
Star api-search/apis-io	0 (current unlock)
Star api-search/apis-io + verify email	1
Newsletter signup confirmed	2
Newsletter + follow @apievangelist on X	3
Above + share apis.io with attribution (verifiable backlink)	4
Above + GitHub Sponsor $1/mo OR contribute a provider	5
Above + paid newsletter tier OR active corp sponsor	6

Each task verification follows the same pattern as the GitHub-star check: a /unlock/task/{name} endpoint that takes whatever proof the task needs (email confirmation token, Twitter username + check via API, backlink URL + crawl-and-verify, Stripe webhook, GitHub Sponsors API) and updates the key’s tier.

Activity log already captures everything — add tier to the AE schema (blob13) so you can report on which tier each gated hit was served at.

Endpoint design: natural-language Q&A (Q98)

A single endpoint that answers free-form questions over the apis.io corpus with citations back to the underlying provider/API/capability/tag pages. Tier 6.

POST /insights/ask
X-APIs-IO-Key: apisio_…
Content-Type: application/json

{
  "question": "Which payment providers offer webhooks, support OAuth2, and publish their rate limits?",
  "max_citations": 8,
  "format": "markdown" | "json",
  "scope": { "tags": ["payments"] }   // optional pre-filter
}

Response (JSON form):

{
  "answer": "Stripe, Square, and Adyen all offer webhook delivery, OAuth2 auth, and published rate limits. Stripe leads on agent-readiness composite (87) …",
  "citations": [
    { "anchor": "https://apis.io/providers/stripe/", "title": "Stripe", "facets_used": ["governance", "operational_transparency"] },
    { "anchor": "https://apis.io/apis/stripe/webhooks/", "title": "Stripe Webhooks" }
  ],
  "retrieval_meta": { "candidates_scored": 142, "tokens_in": 18421, "model": "claude-sonnet-4-6", "cache_hit": false },
  "tier_required": 6
}

Architecture

Pre-built corpus index — network/scripts/build-rag.py (new) generates one chunk per provider, API, capability, and tag. Each chunk = the same markdown the Worker serves on content-negotiated GET, plus structured frontmatter (score, facets, tags, capabilities). Embedded with Voyage 3 (or whatever the AE SQL hop wants), stored in Cloudflare Vectorize keyed by {kind}/{slug}.
Retrieval — Worker calls Vectorize semantic top-k (k=20), then re-ranks with a keyword + tag-facet filter from scope, narrows to top-N (N=8). Each candidate carries its anchor URL and a 400-token excerpt.
Synthesis — Worker POSTs to the Anthropic API (the project already imports @anthropic-ai/sdk patterns elsewhere) with prompt caching enabled on the system block (the rubric description) and on the retrieved context block. Model: claude-sonnet-4-6 for quality, claude-haiku-4-5-20251001 for the cheap tier.
Citation enforcement — system prompt requires the answer to cite only anchors from the retrieved set; a post-check rejects any URL not in the retrieval set and re-asks.
Caching — hash the normalized question + scope → KV qa-cache:{hash} with 24h TTL. Cache hits return in <100ms and don’t count against the per-key quota.
Quota — Tier 6 keys get N questions/day (10? 50? your call). Increment on cache miss only. Logged to AE as event_type="ask" with tokens_in, tokens_out, cache_hit.
Failure modes — if Vectorize returns no candidates above threshold, the endpoint returns 422 with a list of related search queries instead of fabricating an answer.

New Worker bindings (additions to wrangler.toml)

[[vectorize]]
binding = "RAG"
index_name = "apis-io-corpus"

# Secret:
#   wrangler secret put ANTHROPIC_API_KEY

Build pipeline

python network/scripts/build-rag.py        # generate chunks + embeddings
npx wrangler vectorize upsert apis-io-corpus --file out/embeddings.ndjson

Re-run nightly (or after a network build) via GitHub Actions. Cost: ~$0.05 to re-embed the whole corpus with Voyage 3 at current scale.

Why this matters

Agents currently fan out across 10–50 fetches to answer questions like “which payment providers offer webhooks + OAuth2 + rate limits.” The single POST /insights/ask resolves it in one round-trip with citations. That’s the most expensive Tier 6 question by API cost — and the most valuable to gate behind real commitments.

Task design: verifiable backlink (Tier 4 unlock)

The strongest “agent works for us” lever: in exchange for a tier upgrade, the agent (or its operator) embeds a do-follow link to apis.io on a public page they control. Free SEO + attribution.

Two-step flow

1. Issue a backlink challenge:

POST /unlock/task/backlink/challenge
X-APIs-IO-Key: apisio_…

Returns:

{
  "token": "bl_8f3a…",
  "expires_at": "2026-05-30T18:00:00Z",
  "embed_options": [
    { "kind": "meta_tag", "html": "<meta name=\"apis-io-attribution\" content=\"bl_8f3a…\" />" },
    { "kind": "link_data_attr", "html": "<a href=\"https://apis.io/\" data-apis-io-token=\"bl_8f3a…\">Powered by APIs.io</a>" },
    { "kind": "html_comment", "html": "<!-- apis-io-attribution:bl_8f3a… -->" }
  ],
  "rules": {
    "link_must_be": "do-follow (rel must not contain 'nofollow')",
    "link_target": "any apis.io or *.apis.io URL",
    "page_must_be": "publicly accessible without auth, indexable by crawlers"
  }
}

Stored: task_challenge:bl_… → {key_id, kind: "backlink", created_at}, 1-week TTL.

2. Claim the backlink:

POST /unlock/task/backlink/claim
X-APIs-IO-Key: apisio_…
Content-Type: application/json

{
  "token": "bl_8f3a…",
  "backlink_url": "https://operator-blog.example/posts/apis-io-review"
}

Worker fetches backlink_url with Accept: text/html, then verifies all of:

Status 200, content-type contains text/html, size ≤ 5 MB.
The page contains the issued token in one of: <meta name="apis-io-attribution">, <a data-apis-io-token="…">, or an HTML comment matching the issued pattern. (Token bound to this key → can’t be reused.)
The page contains a <a href="…"> to any apis.io or *.apis.io URL.
That anchor’s rel attribute does not contain nofollow, sponsored, or ugc.
The link is not inside a <template>, <noscript>, or commented out.
The page is reachable without auth and without a noindex meta directive (it has to actually be indexable for the SEO value to be real).

On success: bump the key’s tier to 4 (configurable), append "backlink" to tasks_completed[], store {backlink_url, token, verified_at, link_anchor_html} under backlink:{key}. Delete the challenge. Log event_type="task_backlink_verified".

Re-verification

A cron-triggered Worker (CronCreate or a GitHub Action calling a re-verify endpoint) re-fetches each registered backlink weekly. If the link is gone, rel was changed to nofollow, or the page returns 404/410, the key gets demoted back to its previous tier and event_type="task_backlink_revoked" is logged. The operator email gets a heads-up — implementing the email is optional, the demotion isn’t.

KV entries this adds

task_challenge:{token}        JSON {key_id, kind, created_at}      TTL 7 days
backlink:{key}                JSON {backlink_url, token, verified_at, link_anchor_html, last_recheck_at}

Logging (additions to the existing AE schema)

Three new event types in blob1:

task_backlink_issued — challenge handed out
task_backlink_verified — claim succeeded, tier bumped
task_backlink_revoked — recheck found the link gone/nofollowed/404

Plus a new field blob13 = tier so every existing event (gated_allowed, gated_denied, etc.) is queryable by tier served.

Anti-fraud

Token binding — each token is tied to a single key, single-use, and the token must appear on the page. Spamming a generic “powered by apis.io” link across pages does nothing without the token.
Domain blacklist — Worker maintains blacklisted_hosts:{host} in KV; pastebins, link shorteners, ephemeral preview deploys (vercel.app, netlify.app subdomains unless registered to the operator), AI-content farms.
Domain uniqueness — at most one verified backlink per operator_email per registrable domain. Stops one operator carpet-bombing low-quality pages for tier creep.
Minimum page age (optional) — reject pages whose Last-Modified or visible publish date is <48h old. Discourages “just spin up a page to claim.” Skip if the friction’s too high.

Why this is the right first task to ship

Verifiable — every check is a single HTTP fetch + HTML parse. No external APIs (unlike Stripe webhook, Twitter follow check, etc.).
Aligned incentives — the agent’s operator pays nothing, apis.io gets a real backlink. Both sides win.
Logged — every action lands in AE. If it’s being abused, you see it in the dashboard.
Composable — once /unlock/task/{name} is a pattern, newsletter signup, Twitter follow, GitHub follow, GitHub Sponsor, and the rest are all 1-file additions.