APIs.io insights catalog — draft
A tiered catalog of questions agents can pay to get answers to. Tier 0 is what the current /unlock/claim (star naftiko/ikanos) already grants — raw data access. Tier 1+ are insights derived from that data: aggregations, rankings, cross-cuts, and trends. Each tier is meant to be paired with a different agent task of escalating effort.
Scale we’re sitting on (so the value per question is real):
- 4,444 providers (each scored on 6 facets, 0–100 composite + trend)
- 16,713 APIs, 33,507 capabilities (Naftiko + MCP-ready), 42,683 schemas
- 3,631 tags with co-occurrence graph (Jaccard similarity, neighbors)
- 1,376 Spectral rulesets, 4,098 plan/pricing scaffolds, 4,098 FinOps profiles
- 3,685 JSON-LD contexts, 100+ canonical cross-provider capabilities
- Per-tag scorecards: frequency, breadth, quality_lift, cohesion + facet assignment
Tier 0 — Raw data access (current unlock)
Already gated behind: star naftiko/ikanos.
- Bulk listings:
/search-index.json,/apis/index.json, etc. - Individual provider / API / capability records in JSON, YAML, and markdown
- All
.well-known/*agent-readiness files - Embedded source YAML (auditable, single fetch)
Value: agents get the full corpus in machine form without scraping. This is the table-stakes unlock.
Tier 1 — Pre-computed listings & directories
Cheap aggregations. Save the agent dozens of fetches.
Rows marked [MVP] ship first — these are the entry-point insights that prove the catalog has value beyond raw data.
| # | Question | What the answer looks like | Source |
|---|---|---|---|
| 1 | [MVP] Which providers offer {capability}? | Ranked list with score, API count, tags | canonical-capabilities.yml + providers/ |
| 2 | [MVP] Top 50 providers by API count | Table: provider, count, primary tags | providers/_providers/* |
| 3 | [MVP] Top 100 tags by frequency | Tag, count, breadth, band | signals/_data/tags.json |
| 4 | All providers publishing {common[]} type (Pricing / ChangeLog / Deprecation / SDK / Security policy) | List + URL | providers/*/common[].type |
| 5 | [MVP] All APIs with OpenAPI specs (vs Postman / AsyncAPI) | Provider, API, spec format, URL | providers/*/api_specs[] |
| 6 | All providers with Spectral rules | Provider, rule count, severity breakdown | rules/_rules/* |
| 7 | All FOCUS-aligned (FinOps) providers | Provider, billing model, meters | finops/_finops/* |
| 8 | All providers publishing JSON-LD contexts | Provider, class count, namespaces | json-ld/_jsonld/* |
| 9 | APIs by HTTP auth type (OAuth2 / API Key / JWT / mTLS) | Counts + provider list per type | capabilities/*/source_yaml.consumes.auth |
| 10 | All providers with public changelogs | Provider, changelog URL | common[].type = ChangeLog |
| 11 | All providers offering webhooks | Provider, webhook URL, events list | common[].type = Webhooks + capabilities |
| 12 | All providers with status pages | Provider, status URL | common[].type = Status |
| 13 | All providers with sandbox / test environments | Provider, sandbox URL | common[].type ∈ {Sandbox, Testing} |
| 14 | All AsyncAPI publishers (event-driven APIs) | Provider, channel count, broker | asyncapi/_asyncapi/* |
| 15 | All providers with SDKs (by language) | Provider, languages, repo URLs | common[].type = SDK |
| 16 | All MCP-ready capabilities (tools[] populated) | Capability, tool count, hints | capabilities/*/tools[] |
| 17 | All providers with developer portals | Provider, portal URL | common[].type = DeveloperPortal |
| 18 | Schemas by type ({Address, Money, Webhook, …}) | Schema name, provider, ref count | schemas/_schemas/* |
| 19 | All providers with deprecation notices | Provider, deprecation URL, affected APIs | common[].type = Deprecation |
| 20 | OpenAPI version distribution (2.0 / 3.0 / 3.1) | Version, provider count, % share | api_specs[].spec_type |
Suggested task to unlock Tier 1: something light — newsletter signup, follow on X, GitHub follow.
Tier 2 — Scored rankings (the rubric is the moat)
The composite score + 6 facets is a real product. Agents can’t replicate it without re-implementing the rubric.
| # | Question | Source |
|---|---|---|
| 21 | Top 25 most “agent-ready” providers (composite ≥ 80) | providers/*/score |
| 22 | Top providers in {domain} by contract_quality | score.facets.contract_quality |
| 23 | Top providers by developer_ergonomics | score.facets.developer_ergonomics |
| 24 | Top providers by operational_transparency | score.facets.operational_transparency |
| 25 | Top providers by commercial_clarity (pricing/plans clarity) | score.facets.commercial_clarity |
| 26 | Top providers by governance | score.facets.governance |
| 27 | Top providers by discoverability (machine-readable surface) | score.facets.discoverability |
| 28 | Score band distribution across the network (minimal/thin/emerging/growth/mature) | score.band |
| 29 | Where does {provider} rank against the rest of its capability cohort? | Composite + percentile in capability set |
| 30 | Worst-scored providers by facet — useful for “avoid” lists | score.facets.* low end |
| 31 | Score distribution histogram (by band, by domain, by capability) | score.composite + score.band |
| 32 | “Most improved” providers this quarter (composite delta) | score.delta |
| 33 | Median score per capability cohort (benchmarks) | score.composite × canonical-capabilities |
| 34 | Facet correlation: does high contract_quality predict high governance? | score.facets.* |
| 35 | Best-in-class canonical example per facet (single exemplar) | score.facets.* |
| 36 | Provider score percentile rank — “where does {provider} sit globally?” | score.composite |
Suggested task: medium — sign up for newsletter, share a referral, complete a profile.
Tier 3 — Cross-cutting queries (the graph is where the real money is)
Tag co-occurrence, capability overlap, competitive maps. None of this is in the per-record files — it requires the precomputed graph.
| # | Question | Source |
|---|---|---|
| 37 | Which providers offer both X and Y capabilities? | canonical-capabilities ∩ providers |
| 38 | Tags most often paired with {tag} (with Jaccard similarity) | signals/_data/tags.json → neighbors[] |
| 39 | Find APIs similar to {API} (by capability + tag overlap) | tag neighbors + capability match |
| 40 | Capability gaps: canonical capabilities with <3 providers (market opportunities) | canonical-capabilities.yml ∩ providers |
| 41 | Capability saturation: capabilities with >20 providers (commoditized) | same |
| 42 | Authentication landscape in {domain} (% OAuth2 vs API Key vs JWT) | capabilities + tag facet |
| 43 | Spec format mix in {domain} (% OpenAPI vs Postman vs AsyncAPI) | api_specs[].format + tag facet |
| 44 | Which providers compete head-to-head with {provider}? (capability + tag overlap rank) | capabilities + tags + score |
| 45 | Tag cohesion: which tags have the most consistent meaning vs which are noisy? | signals/_data/tags.json → cohesion |
| 46 | Faceted slice: all {facet=domain} APIs for {facet=persona} with {facet=schema} | tag facet model (6-dim) |
| 47 | “Who else does what {provider} does?” (capability cohort with scores) | capabilities + score |
| 48 | Industry coverage: providers per industry tag (Healthcare, Finance, Logistics…) | tag facet = industry |
| 49 | Persona coverage: APIs targeting {persona} (Developer, Admin, Partner) | tag facet = persona |
| 50 | Tag bridges — tags that connect otherwise disjoint capability clusters | tag neighbors + graph analysis |
| 51 | Capability dependencies — which capabilities tend to co-occur within a single provider | capabilities × providers |
| 52 | HTTP verb distribution per provider (GET-heavy vs CRUD vs RPC) | capabilities.operations[].method |
| 53 | Endpoint depth (avg path segments) per domain — REST maturity proxy | operations[].path |
| 54 | API operation pattern: RESTful vs RPC vs hybrid (heuristic) | operations + paths |
| 55 | MCP tool hint distribution (readOnly / destructive / idempotent counts) | capabilities.tools[].hints |
Suggested task: medium-high — sign up for an account, share an apis.io link with attribution in their output, write a 1-line review.
Tier 4 — Cost & operations intelligence
This is where agents directly save their operators money. Pricing/rate-limit comparisons are typically scraped or asked by humans.
| # | Question | Source |
|---|---|---|
| 56 | Pricing tier comparison for {capability} (freemium / paid / enterprise) | plans/_plans/* |
| 57 | Free-tier-friendly APIs for {capability} | plans.type = freemium |
| 58 | Cheapest paid tier per million requests for {capability} | plans.entries[].price / metric |
| 59 | Rate-limit comparison for {domain} APIs | rate-limits/* |
| 60 | Which providers publish rate limits at all? (vs hidden) | rate-limits/_rate-limits/* |
| 61 | FOCUS-aligned providers (FinOps-ready: billing model + meters) | finops/_finops/* |
| 62 | Unit-economics summaries per {capability} | finops.unit_economics[] |
| 63 | Charge category breakdown per provider (subscription / usage / one-time) | finops.billing_model.chargeCategories |
| 64 | Geo-pricing differences (US vs EU vs APAC) | plans.entries[].geo |
| 65 | Overage pricing per provider | plans.entries[].type=overage |
| 66 | “Cost to run {workflow} on top-5 providers” — synthesized total-of-ownership | plans + finops + capability operations |
| 67 | Hidden-fee scan: which providers have overage clauses, geo surcharges, or per-user multipliers | plans.entries[].userMultiplied etc |
| 68 | Free quota leaderboard: calls/month at $0 per capability | plans.type=freemium + entries |
| 69 | Enterprise-tier required features per capability (what triggers a sales call) | plans.type=enterprise.elements[] |
| 70 | Multi-currency support per provider (which billing currencies are offered) | finops.billing_model.billingCurrency |
| 71 | Pricing transparency score — does the provider publish prices at all, or “contact us” wall? | plans.entries[].price numeric vs “TBD”/”custom” |
| 72 | “Cheapest viable stack” for {workflow} — minimum-cost provider set across required capabilities | plans + capabilities + dependency graph |
Suggested task: higher — paid signup, GitHub Sponsors $1, share apis.io in a public post with link.
Tier 5 — Trends, deltas & change intelligence
Longitudinal. Requires signals/tag_history/, provider score delta/trend fields, and the (currently placeholder) diff/ pipeline.
| # | Question | Source |
|---|---|---|
| 73 | Providers with biggest score gain last 30 days | score.delta + score.trend |
| 74 | Providers with biggest score loss last 30 days (risk signal) | same |
| 75 | Newly added providers (last 7 / 30 days) | created field |
| 76 | Tags rising (frequency growth, breadth growth) | signals/tag_history/ |
| 77 | Tags falling (declining frequency) | same |
| 78 | Spec changes in {watch list} since last visit | diff/ (needs pipeline) |
| 79 | Provider transitions across bands (emerging → growth → mature) | score.band history |
| 80 | New capabilities entering the canonical taxonomy | canonical-capabilities.yml diff |
| 81 | Deprecation announcements per provider | common[].type = Deprecation |
| 82 | “What’s new since {date}?” — fully synthesized changelog | scoring + diff + new providers |
| 83 | Breaking-change frequency per provider (how often do they make agents rewrite) | spec diff history |
| 84 | Spec velocity: updates per quarter per provider (active vs abandoned) | modified field history |
| 85 | New endpoint launches per provider, last 90 days | spec diff |
| 86 | Removed endpoints per provider, last 90 days (what broke) | spec diff |
| 87 | Tag entropy over time — is the taxonomy consolidating or fragmenting? | signals/tag_history/ |
| 88 | Provider commit cadence — governance signal from source repo activity | provider source repo |
Suggested task: real value exchange — newsletter signup (gives you a contact), follow on X, subscribe to apis.io feed.
Tier 6 — Synthesis, strategy, and ready-to-use agent payloads
The expensive tier. These are pre-cooked answers an agent’s operator would otherwise pay an analyst for. Also: ready-to-bind MCP/Naftiko bundles, which save the agent runtime work.
| # | Question | Source |
|---|---|---|
| 89 | “Best stack for {persona} building {thing}” — recommended provider set across required capabilities | capabilities + score + tag facets |
| 90 | Industry benchmark report (Healthcare / Fintech / Logistics / DevTools / E-commerce) | tag facet = industry + scores |
| 91 | “Likely to deprecate” risk score per provider | score trend + governance + commercial_clarity + age |
| 92 | “Agent-ready scorecard” for an arbitrary domain or company list | scoring + custom slice |
| 93 | Naftiko spec bundle for {capability} — ready to bind into an agent runtime | capabilities/*/source_yaml packed |
| 94 | MCP tool bundle for {capability} (tools[], hints, auth) | capabilities/*/tools[] packed |
| 95 | Pre-resolved JSON-LD context for {domain} | json-ld/* cross-merged |
| 96 | Compliance posture (which providers publish ToS / Privacy / Security / Status) | common[].type rollup |
| 97 | “Buy vs build” report for {capability} — top 3 providers + cost + integration effort | plans + capabilities + rules |
| 98 | Custom natural-language Q&A across the corpus — single answer, with citations (see endpoint design below) | LLM over the curated index |
| 99 | “Replace {expensive provider}” — cheaper alternatives ranked by capability overlap × cost | plans + capabilities + score |
| 100 | Migration path between competing providers — endpoint-by-endpoint mapping for {provider A} → {provider B} | capabilities + operations + schemas |
| 101 | Compliance bundle per regulation (HIPAA / GDPR / SOC2 / PCI) — pre-filtered provider set with attestation links | common[].type=Security + tag facets |
| 102 | Risk register for a provider list — composite of likely-deprecate + governance + commercial_clarity | score + delta + age |
| 103 | Agent runtime config bundle — Naftiko + MCP + JSON-LD assembled into a single drop-in package | capabilities + json-ld + rules |
| 104 | Domain-specific RAG corpus — pre-packed markdown bundle for an industry, ready to embed | tag facet=industry + provider markdown |
Suggested task: highest — paid newsletter tier, conference attendance, sponsor link, contribute a provider/correction back to apis.io.
Implementation notes (for whoever builds this)
These don’t have to be 64 separate endpoints. A practical surface might be:
GET /insights/providers?capability=…&sort=score&facet=… # Tier 1, 2, 3
GET /insights/tags/{tag}/neighbors # Tier 3
GET /insights/pricing/{capability} # Tier 4
GET /insights/trends?window=30d&metric=… # Tier 5
GET /insights/bundles/{capability}.naftiko.json # Tier 6
POST /insights/ask { question: "…" } # Tier 6 — LLM over corpus
Each route checks the caller’s key (X-APIs-IO-Key) against a tier field on the key record:
{ "github_username": "...", "tier": 3, "tasks_completed": ["star_ikanos", "newsletter_signup", "follow_x"] }
Tasks → tier mapping (your call to design) — examples:
| Task | Bumps key to tier |
|---|---|
| Star naftiko/ikanos | 0 (current unlock) |
| Star naftiko/ikanos + verify email | 1 |
| Newsletter signup confirmed | 2 |
| Newsletter + follow @apievangelist on X | 3 |
| Above + share apis.io with attribution (verifiable backlink) | 4 |
| Above + GitHub Sponsor $1/mo OR contribute a provider | 5 |
| Above + paid newsletter tier OR active corp sponsor | 6 |
Each task verification follows the same pattern as the GitHub-star check: a /unlock/task/{name} endpoint that takes whatever proof the task needs (email confirmation token, Twitter username + check via API, backlink URL + crawl-and-verify, Stripe webhook, GitHub Sponsors API) and updates the key’s tier.
Activity log already captures everything — add tier to the AE schema (blob13) so you can report on which tier each gated hit was served at.
Endpoint design: natural-language Q&A (Q98)
A single endpoint that answers free-form questions over the apis.io corpus with citations back to the underlying provider/API/capability/tag pages. Tier 6.
POST /insights/ask
X-APIs-IO-Key: apisio_…
Content-Type: application/json
{
"question": "Which payment providers offer webhooks, support OAuth2, and publish their rate limits?",
"max_citations": 8,
"format": "markdown" | "json",
"scope": { "tags": ["payments"] } // optional pre-filter
}
Response (JSON form):
{
"answer": "Stripe, Square, and Adyen all offer webhook delivery, OAuth2 auth, and published rate limits. Stripe leads on agent-readiness composite (87) …",
"citations": [
{ "anchor": "https://providers.apis.io/providers/stripe/", "title": "Stripe", "facets_used": ["governance", "operational_transparency"] },
{ "anchor": "https://apis.apis.io/apis/stripe/webhooks/", "title": "Stripe Webhooks" },
{ "anchor": "https://capabilities.apis.io/capabilities/stripe/payments/", "title": "Stripe Payments capability" }
],
"retrieval_meta": { "candidates_scored": 142, "tokens_in": 18421, "model": "claude-sonnet-4-6", "cache_hit": false },
"tier_required": 6
}
Architecture
- Pre-built corpus index —
network/scripts/build-rag.py(new) generates one chunk per provider, API, capability, and tag. Each chunk = the same markdown the Worker serves on content-negotiated GET, plus structured frontmatter (score, facets, tags, capabilities). Embedded with Voyage 3 (or whatever the AE SQL hop wants), stored in Cloudflare Vectorize keyed by{kind}/{slug}. - Retrieval — Worker calls Vectorize semantic top-k (k=20), then re-ranks with a keyword + tag-facet filter from
scope, narrows to top-N (N=8). Each candidate carries its anchor URL and a 400-token excerpt. - Synthesis — Worker POSTs to the Anthropic API (the project already imports
@anthropic-ai/sdkpatterns elsewhere) with prompt caching enabled on the system block (the rubric description) and on the retrieved context block. Model:claude-sonnet-4-6for quality,claude-haiku-4-5-20251001for the cheap tier. - Citation enforcement — system prompt requires the answer to cite only anchors from the retrieved set; a post-check rejects any URL not in the retrieval set and re-asks.
- Caching — hash the normalized question + scope → KV
qa-cache:{hash}with 24h TTL. Cache hits return in <100ms and don’t count against the per-key quota. - Quota — Tier 6 keys get N questions/day (10? 50? your call). Increment on cache miss only. Logged to AE as
event_type="ask"withtokens_in,tokens_out,cache_hit. - Failure modes — if Vectorize returns no candidates above threshold, the endpoint returns 422 with a list of related search queries instead of fabricating an answer.
New Worker bindings (additions to wrangler.toml)
[[vectorize]]
binding = "RAG"
index_name = "apis-io-corpus"
# Secret:
# wrangler secret put ANTHROPIC_API_KEY
Build pipeline
python network/scripts/build-rag.py # generate chunks + embeddings
npx wrangler vectorize upsert apis-io-corpus --file out/embeddings.ndjson
Re-run nightly (or after a network build) via GitHub Actions. Cost: ~$0.05 to re-embed the whole corpus with Voyage 3 at current scale.
Why this matters
Agents currently fan out across 10–50 fetches to answer questions like “which payment providers offer webhooks + OAuth2 + rate limits.” The single POST /insights/ask resolves it in one round-trip with citations. That’s the most expensive Tier 6 question by API cost — and the most valuable to gate behind real commitments.
Task design: verifiable backlink (Tier 4 unlock)
The strongest “agent works for us” lever: in exchange for a tier upgrade, the agent (or its operator) embeds a do-follow link to apis.io on a public page they control. Free SEO + attribution.
Two-step flow
1. Issue a backlink challenge:
POST /unlock/task/backlink/challenge
X-APIs-IO-Key: apisio_…
Returns:
{
"token": "bl_8f3a…",
"expires_at": "2026-05-30T18:00:00Z",
"embed_options": [
{ "kind": "meta_tag", "html": "<meta name=\"apis-io-attribution\" content=\"bl_8f3a…\" />" },
{ "kind": "link_data_attr", "html": "<a href=\"https://apis.io/\" data-apis-io-token=\"bl_8f3a…\">Powered by APIs.io</a>" },
{ "kind": "html_comment", "html": "<!-- apis-io-attribution:bl_8f3a… -->" }
],
"rules": {
"link_must_be": "do-follow (rel must not contain 'nofollow')",
"link_target": "any apis.io or *.apis.io URL",
"page_must_be": "publicly accessible without auth, indexable by crawlers"
}
}
Stored: task_challenge:bl_… → {key_id, kind: "backlink", created_at}, 1-week TTL.
2. Claim the backlink:
POST /unlock/task/backlink/claim
X-APIs-IO-Key: apisio_…
Content-Type: application/json
{
"token": "bl_8f3a…",
"backlink_url": "https://operator-blog.example/posts/apis-io-review"
}
Worker fetches backlink_url with Accept: text/html, then verifies all of:
- Status 200, content-type contains
text/html, size ≤ 5 MB. - The page contains the issued token in one of:
<meta name="apis-io-attribution">,<a data-apis-io-token="…">, or an HTML comment matching the issued pattern. (Token bound to this key → can’t be reused.) - The page contains a
<a href="…">to anyapis.ioor*.apis.ioURL. - That anchor’s
relattribute does not containnofollow,sponsored, orugc. - The link is not inside a
<template>,<noscript>, or commented out. - The page is reachable without auth and without a
noindexmeta directive (it has to actually be indexable for the SEO value to be real).
On success: bump the key’s tier to 4 (configurable), append "backlink" to tasks_completed[], store {backlink_url, token, verified_at, link_anchor_html} under backlink:{key}. Delete the challenge. Log event_type="task_backlink_verified".
Re-verification
A cron-triggered Worker (CronCreate or a GitHub Action calling a re-verify endpoint) re-fetches each registered backlink weekly. If the link is gone, rel was changed to nofollow, or the page returns 404/410, the key gets demoted back to its previous tier and event_type="task_backlink_revoked" is logged. The operator email gets a heads-up — implementing the email is optional, the demotion isn’t.
KV entries this adds
task_challenge:{token} JSON {key_id, kind, created_at} TTL 7 days
backlink:{key} JSON {backlink_url, token, verified_at, link_anchor_html, last_recheck_at}
Logging (additions to the existing AE schema)
Three new event types in blob1:
task_backlink_issued— challenge handed outtask_backlink_verified— claim succeeded, tier bumpedtask_backlink_revoked— recheck found the link gone/nofollowed/404
Plus a new field blob13 = tier so every existing event (gated_allowed, gated_denied, etc.) is queryable by tier served.
Anti-fraud
- Token binding — each token is tied to a single key, single-use, and the token must appear on the page. Spamming a generic “powered by apis.io” link across pages does nothing without the token.
- Domain blacklist — Worker maintains
blacklisted_hosts:{host}in KV; pastebins, link shorteners, ephemeral preview deploys (vercel.app, netlify.app subdomains unless registered to the operator), AI-content farms. - Domain uniqueness — at most one verified backlink per
operator_emailper registrable domain. Stops one operator carpet-bombing low-quality pages for tier creep. - Minimum page age (optional) — reject pages whose
Last-Modifiedor visible publish date is <48h old. Discourages “just spin up a page to claim.” Skip if the friction’s too high.
Why this is the right first task to ship
- Verifiable — every check is a single HTTP fetch + HTML parse. No external APIs (unlike Stripe webhook, Twitter follow check, etc.).
- Aligned incentives — the agent’s operator pays nothing, apis.io gets a real backlink. Both sides win.
- Logged — every action lands in AE. If it’s being abused, you see it in the dashboard.
- Composable — once
/unlock/task/{name}is a pattern, newsletter signup, Twitter follow, GitHub follow, GitHub Sponsor, and the rest are all 1-file additions.