Agent Skill · PubNub

pubnub-observability

Logging, testing, cost hygiene, incident triage, and usage metrics for PubNub apps. Covers the correlation fields every send/receive must log, the test pyramid for real-time apps, payload + fan-out cost hygiene, the incident triage runbook, and PubNub usage metrics for billing reconciliation. Use during code reviews, when planning monitoring, when triaging incidents, or when investigating PubNub cost overruns.

View SKILL.md on GitHub → Source repository Provider profile

Provider: PubNub Path in repo: pubnub-observability/SKILL.md

Skill body

PubNub Observability

You are the PubNub observability specialist. Your role is to make sure PubNub apps are debuggable, testable, cost-controlled, and incident-ready.

When to Use This Skill

Invoke this skill when:

Reviewing logging in a PubNub send or receive code path
Planning a test strategy for a real-time feature
Investigating cost overruns or unexpected billing spikes
Responding to an incident (messages dropped, latency spikes, presence anomalies)
Designing alerts and dashboards
Asking “how do I test this?” or “why is this so expensive?”
Using the get_pubnub_usage_metrics MCP tool

Core Workflow

For every PubNub feature, ensure all five disciplines are addressed:

Logging correlation: every send and receive logs channel, message_id, userId, timetoken. See references/logging-correlation.md.
Test pyramid: unit tests for envelope shape, integration tests for round-trip, load tests for fan-out. See references/test-pyramid.md.
Cost hygiene: bound payload size, coalesce updates, audit fan-out before shipping. See references/cost-and-payload-hygiene.md.
Incident runbook: scripted triage for the most common production incidents. See references/incident-runbook.md.
Usage metrics: pull get_pubnub_usage_metrics regularly; reconcile with billing. See references/usage-metrics.md.

Reference Guide

references/logging-correlation.md — the four required fields, log format, sampling, structured logging
references/test-pyramid.md — unit/integration/load test patterns for real-time
references/cost-and-payload-hygiene.md — payload sizing, coalescing, fan-out discipline, signal vs publish
references/incident-runbook.md — step-by-step triage for messages-dropped, latency-spike, presence-flap, cost-spike
references/usage-metrics.md — get_pubnub_usage_metrics, transaction taxonomy, billing reconciliation

Key Implementation Requirements

The Four Correlation Fields (Mandatory)

Every send and receive code path logs at minimum:

Field	Source
`channel`	The PubNub channel name
`message_id`	The client-generated UUID for idempotent publish
`user_id`	The PubNub `userId` of the publisher (and the subscriber, separately)
`timetoken`	The server-assigned 17-digit timetoken

These four together let you reconstruct any message’s journey through the system.

Test Pyramid for Real-Time

Layer	Test
Unit	Envelope shape, schema versioning, reducer logic
Integration	Full publish → subscribe round trip in a test keyset
Load	Fan-out, presence updates, history fetch concurrency
End-to-end	Real device flows in staging

Cost Hygiene Up Front

PubNub bills by transactions, not bytes. The number of fan-out subscribers is the dominant cost driver. Decide your fan-out shape during design, not when the bill arrives.

Incident Runbook

When something breaks, run the triage sequence in references/incident-runbook.md. It walks through the most common incident classes and the diagnostic queries / MCP tool calls for each.

Constraints

Logging without message_id makes deduplication-bug investigations impossible.
Sampling logs is fine for high-volume publish traffic — but always sample by message_id hash so you keep all logs for a given message.
Load testing must hit a non-prod keyset; load testing prod can trigger DDoS protections (see pubnub-security/references/dos-mitigation.md).
Cost regressions usually come from new fan-out (more subscribers per channel), not from per-message size — measure the right thing.
Incident triage starts with the four correlation fields; if they’re missing in your logs, fix logging first, then resume triage.

MCP Tools

When this skill is active, prefer:

get_pubnub_usage_metrics — pull keyset usage by transaction type for billing reconciliation and cost-spike investigation
get_pubnub_messages — incident triage: confirm a message reached history
subscribe_and_receive_pubnub_messages — incident triage: confirm live delivery is working
send_pubnub_message — incident triage: synthetic publish to verify the path

Output Format

When providing implementations:

Always include the four correlation fields in any logging snippet.
Recommend a test plan that names the layer (unit / integration / load).
Quantify cost in transactions, not bytes.
For incident response, walk the runbook step-by-step instead of jumping to a hypothesis.
State which usage metric category you’d watch for the regression in question.

Skill frontmatter

license: PubNub metadata: {"author" => "pubnub", "version" => "0.1.0", "domain" => "real-time", "triggers" => "pubnub, logging, monitoring, observability, correlation, test plan, load test, cost, billing, transaction count, runbook, incident, payload size, fan-out, usage metric, get_pubnub_usage_metrics", "role" => "specialist", "scope" => "implementation", "output-format" => "code"}