APIs.io insights catalog — draft

A tiered catalog of questions agents can pay to get answers to. Tier 0 is what the current /unlock/claim (star naftiko/ikanos) already grants — raw data access. Tier 1+ are insights derived from that data: aggregations, rankings, cross-cuts, and trends. Each tier is meant to be paired with a different agent task of escalating effort.

Scale we’re sitting on (so the value per question is real):


Tier 0 — Raw data access (current unlock)

Already gated behind: star naftiko/ikanos.

Value: agents get the full corpus in machine form without scraping. This is the table-stakes unlock.


Tier 1 — Pre-computed listings & directories

Cheap aggregations. Save the agent dozens of fetches.

Rows marked [MVP] ship first — these are the entry-point insights that prove the catalog has value beyond raw data.

# Question What the answer looks like Source
1 [MVP] Which providers offer {capability}? Ranked list with score, API count, tags canonical-capabilities.yml + providers/
2 [MVP] Top 50 providers by API count Table: provider, count, primary tags providers/_providers/*
3 [MVP] Top 100 tags by frequency Tag, count, breadth, band signals/_data/tags.json
4 All providers publishing {common[]} type (Pricing / ChangeLog / Deprecation / SDK / Security policy) List + URL providers/*/common[].type
5 [MVP] All APIs with OpenAPI specs (vs Postman / AsyncAPI) Provider, API, spec format, URL providers/*/api_specs[]
6 All providers with Spectral rules Provider, rule count, severity breakdown rules/_rules/*
7 All FOCUS-aligned (FinOps) providers Provider, billing model, meters finops/_finops/*
8 All providers publishing JSON-LD contexts Provider, class count, namespaces json-ld/_jsonld/*
9 APIs by HTTP auth type (OAuth2 / API Key / JWT / mTLS) Counts + provider list per type capabilities/*/source_yaml.consumes.auth
10 All providers with public changelogs Provider, changelog URL common[].type = ChangeLog
11 All providers offering webhooks Provider, webhook URL, events list common[].type = Webhooks + capabilities
12 All providers with status pages Provider, status URL common[].type = Status
13 All providers with sandbox / test environments Provider, sandbox URL common[].type ∈ {Sandbox, Testing}
14 All AsyncAPI publishers (event-driven APIs) Provider, channel count, broker asyncapi/_asyncapi/*
15 All providers with SDKs (by language) Provider, languages, repo URLs common[].type = SDK
16 All MCP-ready capabilities (tools[] populated) Capability, tool count, hints capabilities/*/tools[]
17 All providers with developer portals Provider, portal URL common[].type = DeveloperPortal
18 Schemas by type ({Address, Money, Webhook, …}) Schema name, provider, ref count schemas/_schemas/*
19 All providers with deprecation notices Provider, deprecation URL, affected APIs common[].type = Deprecation
20 OpenAPI version distribution (2.0 / 3.0 / 3.1) Version, provider count, % share api_specs[].spec_type

Suggested task to unlock Tier 1: something light — newsletter signup, follow on X, GitHub follow.


Tier 2 — Scored rankings (the rubric is the moat)

The composite score + 6 facets is a real product. Agents can’t replicate it without re-implementing the rubric.

# Question Source
21 Top 25 most “agent-ready” providers (composite ≥ 80) providers/*/score
22 Top providers in {domain} by contract_quality score.facets.contract_quality
23 Top providers by developer_ergonomics score.facets.developer_ergonomics
24 Top providers by operational_transparency score.facets.operational_transparency
25 Top providers by commercial_clarity (pricing/plans clarity) score.facets.commercial_clarity
26 Top providers by governance score.facets.governance
27 Top providers by discoverability (machine-readable surface) score.facets.discoverability
28 Score band distribution across the network (minimal/thin/emerging/growth/mature) score.band
29 Where does {provider} rank against the rest of its capability cohort? Composite + percentile in capability set
30 Worst-scored providers by facet — useful for “avoid” lists score.facets.* low end
31 Score distribution histogram (by band, by domain, by capability) score.composite + score.band
32 “Most improved” providers this quarter (composite delta) score.delta
33 Median score per capability cohort (benchmarks) score.composite × canonical-capabilities
34 Facet correlation: does high contract_quality predict high governance? score.facets.*
35 Best-in-class canonical example per facet (single exemplar) score.facets.*
36 Provider score percentile rank — “where does {provider} sit globally?” score.composite

Suggested task: medium — sign up for newsletter, share a referral, complete a profile.


Tier 3 — Cross-cutting queries (the graph is where the real money is)

Tag co-occurrence, capability overlap, competitive maps. None of this is in the per-record files — it requires the precomputed graph.

# Question Source
37 Which providers offer both X and Y capabilities? canonical-capabilities ∩ providers
38 Tags most often paired with {tag} (with Jaccard similarity) signals/_data/tags.jsonneighbors[]
39 Find APIs similar to {API} (by capability + tag overlap) tag neighbors + capability match
40 Capability gaps: canonical capabilities with <3 providers (market opportunities) canonical-capabilities.yml ∩ providers
41 Capability saturation: capabilities with >20 providers (commoditized) same
42 Authentication landscape in {domain} (% OAuth2 vs API Key vs JWT) capabilities + tag facet
43 Spec format mix in {domain} (% OpenAPI vs Postman vs AsyncAPI) api_specs[].format + tag facet
44 Which providers compete head-to-head with {provider}? (capability + tag overlap rank) capabilities + tags + score
45 Tag cohesion: which tags have the most consistent meaning vs which are noisy? signals/_data/tags.jsoncohesion
46 Faceted slice: all {facet=domain} APIs for {facet=persona} with {facet=schema} tag facet model (6-dim)
47 “Who else does what {provider} does?” (capability cohort with scores) capabilities + score
48 Industry coverage: providers per industry tag (Healthcare, Finance, Logistics…) tag facet = industry
49 Persona coverage: APIs targeting {persona} (Developer, Admin, Partner) tag facet = persona
50 Tag bridges — tags that connect otherwise disjoint capability clusters tag neighbors + graph analysis
51 Capability dependencies — which capabilities tend to co-occur within a single provider capabilities × providers
52 HTTP verb distribution per provider (GET-heavy vs CRUD vs RPC) capabilities.operations[].method
53 Endpoint depth (avg path segments) per domain — REST maturity proxy operations[].path
54 API operation pattern: RESTful vs RPC vs hybrid (heuristic) operations + paths
55 MCP tool hint distribution (readOnly / destructive / idempotent counts) capabilities.tools[].hints

Suggested task: medium-high — sign up for an account, share an apis.io link with attribution in their output, write a 1-line review.


Tier 4 — Cost & operations intelligence

This is where agents directly save their operators money. Pricing/rate-limit comparisons are typically scraped or asked by humans.

# Question Source
56 Pricing tier comparison for {capability} (freemium / paid / enterprise) plans/_plans/*
57 Free-tier-friendly APIs for {capability} plans.type = freemium
58 Cheapest paid tier per million requests for {capability} plans.entries[].price / metric
59 Rate-limit comparison for {domain} APIs rate-limits/*
60 Which providers publish rate limits at all? (vs hidden) rate-limits/_rate-limits/*
61 FOCUS-aligned providers (FinOps-ready: billing model + meters) finops/_finops/*
62 Unit-economics summaries per {capability} finops.unit_economics[]
63 Charge category breakdown per provider (subscription / usage / one-time) finops.billing_model.chargeCategories
64 Geo-pricing differences (US vs EU vs APAC) plans.entries[].geo
65 Overage pricing per provider plans.entries[].type=overage
66 “Cost to run {workflow} on top-5 providers” — synthesized total-of-ownership plans + finops + capability operations
67 Hidden-fee scan: which providers have overage clauses, geo surcharges, or per-user multipliers plans.entries[].userMultiplied etc
68 Free quota leaderboard: calls/month at $0 per capability plans.type=freemium + entries
69 Enterprise-tier required features per capability (what triggers a sales call) plans.type=enterprise.elements[]
70 Multi-currency support per provider (which billing currencies are offered) finops.billing_model.billingCurrency
71 Pricing transparency score — does the provider publish prices at all, or “contact us” wall? plans.entries[].price numeric vs “TBD”/”custom”
72 “Cheapest viable stack” for {workflow} — minimum-cost provider set across required capabilities plans + capabilities + dependency graph

Suggested task: higher — paid signup, GitHub Sponsors $1, share apis.io in a public post with link.


Longitudinal. Requires signals/tag_history/, provider score delta/trend fields, and the (currently placeholder) diff/ pipeline.

# Question Source
73 Providers with biggest score gain last 30 days score.delta + score.trend
74 Providers with biggest score loss last 30 days (risk signal) same
75 Newly added providers (last 7 / 30 days) created field
76 Tags rising (frequency growth, breadth growth) signals/tag_history/
77 Tags falling (declining frequency) same
78 Spec changes in {watch list} since last visit diff/ (needs pipeline)
79 Provider transitions across bands (emerging → growth → mature) score.band history
80 New capabilities entering the canonical taxonomy canonical-capabilities.yml diff
81 Deprecation announcements per provider common[].type = Deprecation
82 “What’s new since {date}?” — fully synthesized changelog scoring + diff + new providers
83 Breaking-change frequency per provider (how often do they make agents rewrite) spec diff history
84 Spec velocity: updates per quarter per provider (active vs abandoned) modified field history
85 New endpoint launches per provider, last 90 days spec diff
86 Removed endpoints per provider, last 90 days (what broke) spec diff
87 Tag entropy over time — is the taxonomy consolidating or fragmenting? signals/tag_history/
88 Provider commit cadence — governance signal from source repo activity provider source repo

Suggested task: real value exchange — newsletter signup (gives you a contact), follow on X, subscribe to apis.io feed.


Tier 6 — Synthesis, strategy, and ready-to-use agent payloads

The expensive tier. These are pre-cooked answers an agent’s operator would otherwise pay an analyst for. Also: ready-to-bind MCP/Naftiko bundles, which save the agent runtime work.

# Question Source
89 “Best stack for {persona} building {thing}” — recommended provider set across required capabilities capabilities + score + tag facets
90 Industry benchmark report (Healthcare / Fintech / Logistics / DevTools / E-commerce) tag facet = industry + scores
91 “Likely to deprecate” risk score per provider score trend + governance + commercial_clarity + age
92 “Agent-ready scorecard” for an arbitrary domain or company list scoring + custom slice
93 Naftiko spec bundle for {capability} — ready to bind into an agent runtime capabilities/*/source_yaml packed
94 MCP tool bundle for {capability} (tools[], hints, auth) capabilities/*/tools[] packed
95 Pre-resolved JSON-LD context for {domain} json-ld/* cross-merged
96 Compliance posture (which providers publish ToS / Privacy / Security / Status) common[].type rollup
97 “Buy vs build” report for {capability} — top 3 providers + cost + integration effort plans + capabilities + rules
98 Custom natural-language Q&A across the corpus — single answer, with citations (see endpoint design below) LLM over the curated index
99 “Replace {expensive provider}” — cheaper alternatives ranked by capability overlap × cost plans + capabilities + score
100 Migration path between competing providers — endpoint-by-endpoint mapping for {provider A} → {provider B} capabilities + operations + schemas
101 Compliance bundle per regulation (HIPAA / GDPR / SOC2 / PCI) — pre-filtered provider set with attestation links common[].type=Security + tag facets
102 Risk register for a provider list — composite of likely-deprecate + governance + commercial_clarity score + delta + age
103 Agent runtime config bundle — Naftiko + MCP + JSON-LD assembled into a single drop-in package capabilities + json-ld + rules
104 Domain-specific RAG corpus — pre-packed markdown bundle for an industry, ready to embed tag facet=industry + provider markdown

Suggested task: highest — paid newsletter tier, conference attendance, sponsor link, contribute a provider/correction back to apis.io.


Implementation notes (for whoever builds this)

These don’t have to be 64 separate endpoints. A practical surface might be:

GET  /insights/providers?capability=…&sort=score&facet=…   # Tier 1, 2, 3
GET  /insights/tags/{tag}/neighbors                        # Tier 3
GET  /insights/pricing/{capability}                        # Tier 4
GET  /insights/trends?window=30d&metric=…                  # Tier 5
GET  /insights/bundles/{capability}.naftiko.json           # Tier 6
POST /insights/ask  { question: "…" }                      # Tier 6 — LLM over corpus

Each route checks the caller’s key (X-APIs-IO-Key) against a tier field on the key record:

{ "github_username": "...", "tier": 3, "tasks_completed": ["star_ikanos", "newsletter_signup", "follow_x"] }

Tasks → tier mapping (your call to design) — examples:

Task Bumps key to tier
Star naftiko/ikanos 0 (current unlock)
Star naftiko/ikanos + verify email 1
Newsletter signup confirmed 2
Newsletter + follow @apievangelist on X 3
Above + share apis.io with attribution (verifiable backlink) 4
Above + GitHub Sponsor $1/mo OR contribute a provider 5
Above + paid newsletter tier OR active corp sponsor 6

Each task verification follows the same pattern as the GitHub-star check: a /unlock/task/{name} endpoint that takes whatever proof the task needs (email confirmation token, Twitter username + check via API, backlink URL + crawl-and-verify, Stripe webhook, GitHub Sponsors API) and updates the key’s tier.

Activity log already captures everything — add tier to the AE schema (blob13) so you can report on which tier each gated hit was served at.


Endpoint design: natural-language Q&A (Q98)

A single endpoint that answers free-form questions over the apis.io corpus with citations back to the underlying provider/API/capability/tag pages. Tier 6.

POST /insights/ask
X-APIs-IO-Key: apisio_…
Content-Type: application/json

{
  "question": "Which payment providers offer webhooks, support OAuth2, and publish their rate limits?",
  "max_citations": 8,
  "format": "markdown" | "json",
  "scope": { "tags": ["payments"] }   // optional pre-filter
}

Response (JSON form):

{
  "answer": "Stripe, Square, and Adyen all offer webhook delivery, OAuth2 auth, and published rate limits. Stripe leads on agent-readiness composite (87) …",
  "citations": [
    { "anchor": "https://providers.apis.io/providers/stripe/", "title": "Stripe", "facets_used": ["governance", "operational_transparency"] },
    { "anchor": "https://apis.apis.io/apis/stripe/webhooks/", "title": "Stripe Webhooks" },
    { "anchor": "https://capabilities.apis.io/capabilities/stripe/payments/", "title": "Stripe Payments capability" }
  ],
  "retrieval_meta": { "candidates_scored": 142, "tokens_in": 18421, "model": "claude-sonnet-4-6", "cache_hit": false },
  "tier_required": 6
}

Architecture

  1. Pre-built corpus indexnetwork/scripts/build-rag.py (new) generates one chunk per provider, API, capability, and tag. Each chunk = the same markdown the Worker serves on content-negotiated GET, plus structured frontmatter (score, facets, tags, capabilities). Embedded with Voyage 3 (or whatever the AE SQL hop wants), stored in Cloudflare Vectorize keyed by {kind}/{slug}.
  2. Retrieval — Worker calls Vectorize semantic top-k (k=20), then re-ranks with a keyword + tag-facet filter from scope, narrows to top-N (N=8). Each candidate carries its anchor URL and a 400-token excerpt.
  3. Synthesis — Worker POSTs to the Anthropic API (the project already imports @anthropic-ai/sdk patterns elsewhere) with prompt caching enabled on the system block (the rubric description) and on the retrieved context block. Model: claude-sonnet-4-6 for quality, claude-haiku-4-5-20251001 for the cheap tier.
  4. Citation enforcement — system prompt requires the answer to cite only anchors from the retrieved set; a post-check rejects any URL not in the retrieval set and re-asks.
  5. Caching — hash the normalized question + scope → KV qa-cache:{hash} with 24h TTL. Cache hits return in <100ms and don’t count against the per-key quota.
  6. Quota — Tier 6 keys get N questions/day (10? 50? your call). Increment on cache miss only. Logged to AE as event_type="ask" with tokens_in, tokens_out, cache_hit.
  7. Failure modes — if Vectorize returns no candidates above threshold, the endpoint returns 422 with a list of related search queries instead of fabricating an answer.

New Worker bindings (additions to wrangler.toml)

[[vectorize]]
binding = "RAG"
index_name = "apis-io-corpus"

# Secret:
#   wrangler secret put ANTHROPIC_API_KEY

Build pipeline

python network/scripts/build-rag.py        # generate chunks + embeddings
npx wrangler vectorize upsert apis-io-corpus --file out/embeddings.ndjson

Re-run nightly (or after a network build) via GitHub Actions. Cost: ~$0.05 to re-embed the whole corpus with Voyage 3 at current scale.

Why this matters

Agents currently fan out across 10–50 fetches to answer questions like “which payment providers offer webhooks + OAuth2 + rate limits.” The single POST /insights/ask resolves it in one round-trip with citations. That’s the most expensive Tier 6 question by API cost — and the most valuable to gate behind real commitments.


The strongest “agent works for us” lever: in exchange for a tier upgrade, the agent (or its operator) embeds a do-follow link to apis.io on a public page they control. Free SEO + attribution.

Two-step flow

1. Issue a backlink challenge:

POST /unlock/task/backlink/challenge
X-APIs-IO-Key: apisio_…

Returns:

{
  "token": "bl_8f3a…",
  "expires_at": "2026-05-30T18:00:00Z",
  "embed_options": [
    { "kind": "meta_tag", "html": "<meta name=\"apis-io-attribution\" content=\"bl_8f3a…\" />" },
    { "kind": "link_data_attr", "html": "<a href=\"https://apis.io/\" data-apis-io-token=\"bl_8f3a…\">Powered by APIs.io</a>" },
    { "kind": "html_comment", "html": "<!-- apis-io-attribution:bl_8f3a… -->" }
  ],
  "rules": {
    "link_must_be": "do-follow (rel must not contain 'nofollow')",
    "link_target": "any apis.io or *.apis.io URL",
    "page_must_be": "publicly accessible without auth, indexable by crawlers"
  }
}

Stored: task_challenge:bl_…{key_id, kind: "backlink", created_at}, 1-week TTL.

2. Claim the backlink:

POST /unlock/task/backlink/claim
X-APIs-IO-Key: apisio_…
Content-Type: application/json

{
  "token": "bl_8f3a…",
  "backlink_url": "https://operator-blog.example/posts/apis-io-review"
}

Worker fetches backlink_url with Accept: text/html, then verifies all of:

  1. Status 200, content-type contains text/html, size ≤ 5 MB.
  2. The page contains the issued token in one of: <meta name="apis-io-attribution">, <a data-apis-io-token="…">, or an HTML comment matching the issued pattern. (Token bound to this key → can’t be reused.)
  3. The page contains a <a href="…"> to any apis.io or *.apis.io URL.
  4. That anchor’s rel attribute does not contain nofollow, sponsored, or ugc.
  5. The link is not inside a <template>, <noscript>, or commented out.
  6. The page is reachable without auth and without a noindex meta directive (it has to actually be indexable for the SEO value to be real).

On success: bump the key’s tier to 4 (configurable), append "backlink" to tasks_completed[], store {backlink_url, token, verified_at, link_anchor_html} under backlink:{key}. Delete the challenge. Log event_type="task_backlink_verified".

Re-verification

A cron-triggered Worker (CronCreate or a GitHub Action calling a re-verify endpoint) re-fetches each registered backlink weekly. If the link is gone, rel was changed to nofollow, or the page returns 404/410, the key gets demoted back to its previous tier and event_type="task_backlink_revoked" is logged. The operator email gets a heads-up — implementing the email is optional, the demotion isn’t.

KV entries this adds

task_challenge:{token}        JSON {key_id, kind, created_at}      TTL 7 days
backlink:{key}                JSON {backlink_url, token, verified_at, link_anchor_html, last_recheck_at}

Logging (additions to the existing AE schema)

Three new event types in blob1:

Plus a new field blob13 = tier so every existing event (gated_allowed, gated_denied, etc.) is queryable by tier served.

Anti-fraud

Why this is the right first task to ship