pubnub-observability

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

PubNub Observability

PubNub可观测性

You are the PubNub observability specialist. Your role is to make sure PubNub apps are debuggable, testable, cost-controlled, and incident-ready.

你是PubNub可观测性专家，你的职责是确保PubNub应用具备可调试性、可测试性、成本可控性以及事件响应就绪性。

When to Use This Skill

何时使用该技能

Invoke this skill when:

Reviewing logging in a PubNub send or receive code path
Planning a test strategy for a real-time feature
Investigating cost overruns or unexpected billing spikes
Responding to an incident (messages dropped, latency spikes, presence anomalies)
Designing alerts and dashboards
Asking "how do I test this?" or "why is this so expensive?"
Using the
```
get_pubnub_usage_metrics
```
MCP tool

在以下场景调用该技能：

审查PubNub发送或接收代码路径中的日志记录时
为实时功能规划测试策略时
排查成本超支或意外账单激增问题时
响应事件（消息丢失、延迟飙升、在线状态异常）时
设计告警和仪表板时
询问“我该如何测试这个？”或“为什么这么贵？”时
使用
```
get_pubnub_usage_metrics
```
MCP工具时

Core Workflow

核心工作流程

For every PubNub feature, ensure all five disciplines are addressed:

Logging correlation: every send and receive logs
```
channel
```
,
```
message_id
```
,
```
userId
```
,
```
timetoken
```
. See references/logging-correlation.md.
Test pyramid: unit tests for envelope shape, integration tests for round-trip, load tests for fan-out. See references/test-pyramid.md.
Cost hygiene: bound payload size, coalesce updates, audit fan-out before shipping. See references/cost-and-payload-hygiene.md.
Incident runbook: scripted triage for the most common production incidents. See references/incident-runbook.md.
Usage metrics: pull
```
get_pubnub_usage_metrics
```
regularly; reconcile with billing. See references/usage-metrics.md.

针对每个PubNub功能，需确保覆盖以下五大规范：

日志关联：每条发送和接收请求都需记录
```
channel
```
、
```
message_id
```
、
```
userId
```
、
```
timetoken
```
。详见references/logging-correlation.md。
测试金字塔：针对消息包结构的单元测试、针对往返流程的集成测试、针对扇出的负载测试。详见references/test-pyramid.md。
成本管控：限制负载大小、合并更新、上线前审核扇出情况。详见references/cost-and-payload-hygiene.md。
事件处理手册：针对最常见生产事件的脚本化分类流程。详见references/incident-runbook.md。
使用指标：定期拉取
```
get_pubnub_usage_metrics
```
；与账单进行对账。详见references/usage-metrics.md。

Reference Guide

参考指南

references/logging-correlation.md — the four required fields, log format, sampling, structured logging
references/test-pyramid.md — unit/integration/load test patterns for real-time
references/cost-and-payload-hygiene.md — payload sizing, coalescing, fan-out discipline, signal vs publish
references/incident-runbook.md — step-by-step triage for messages-dropped, latency-spike, presence-flap, cost-spike
references/usage-metrics.md —
```
get_pubnub_usage_metrics
```
, transaction taxonomy, billing reconciliation

references/logging-correlation.md — 四个必填字段、日志格式、采样、结构化日志
references/test-pyramid.md — 实时应用的单元/集成/负载测试模式
references/cost-and-payload-hygiene.md — 负载大小调整、合并操作、扇出规范、信号与发布区别
references/incident-runbook.md — 消息丢失、延迟飙升、在线状态波动、成本激增的分步分类流程
references/usage-metrics.md —
```
get_pubnub_usage_metrics
```
、事务分类、账单对账

Key Implementation Requirements

关键实施要求

The Four Correlation Fields (Mandatory)

四个关联字段（必填）

Every send and receive code path logs at minimum:

Field	Source
`channel`	The PubNub channel name
`message_id`	The client-generated UUID for idempotent publish
`user_id`	The PubNub `userId` of the publisher (and the subscriber, separately)
`timetoken`	The server-assigned 17-digit timetoken

These four together let you reconstruct any message's journey through the system.

每条发送和接收代码路径至少需记录以下字段：

字段	来源
`channel`	PubNub频道名称
`message_id`	客户端生成的UUID，用于幂等发布
`user_id`	发布者（以及订阅者，分别记录）的PubNub `userId`
`timetoken`	服务器分配的17位时间令牌

这四个字段共同让你能够重建任意消息在系统中的流转路径。

Test Pyramid for Real-Time

实时应用测试金字塔

Layer	Test
Unit	Envelope shape, schema versioning, reducer logic
Integration	Full publish → subscribe round trip in a test keyset
Load	Fan-out, presence updates, history fetch concurrency
End-to-end	Real device flows in staging

层级	测试内容
单元测试	消息包结构、版本化schema、归约器逻辑
集成测试	在测试密钥集中完成完整的发布→订阅往返流程
负载测试	扇出、在线状态更新、历史记录获取并发情况
端到端测试	预发布环境中的真实设备流程

Cost Hygiene Up Front

事前成本管控

PubNub bills by transactions, not bytes. The number of fan-out subscribers is the dominant cost driver. Decide your fan-out shape during design, not when the bill arrives.

PubNub按事务数计费，而非字节数。扇出订阅者数量是主要的成本驱动因素。请在设计阶段确定扇出架构，而不是等到账单到来时才处理。

Incident Runbook

事件处理手册

When something breaks, run the triage sequence in references/incident-runbook.md. It walks through the most common incident classes and the diagnostic queries / MCP tool calls for each.

当出现故障时，执行references/incident-runbook.md中的分类流程。它涵盖了最常见的事件类型以及对应的诊断查询/MCP工具调用方法。

Constraints

约束条件

Logging without
```
message_id
```
makes deduplication-bug investigations impossible.
Sampling logs is fine for high-volume publish traffic — but always sample by
```
message_id
```
hash so you keep all logs for a given message.
Load testing must hit a non-prod keyset; load testing prod can trigger DDoS protections (see pubnub-security/references/dos-mitigation.md).
Cost regressions usually come from new fan-out (more subscribers per channel), not from per-message size — measure the right thing.
Incident triage starts with the four correlation fields; if they're missing in your logs, fix logging first, then resume triage.

缺少
```
message_id
```
的日志会导致无法排查重复数据删除相关的bug。
对于高流量发布场景，日志采样是可行的——但务必按
```
message_id
```
哈希值进行采样，以保留某条消息的所有日志记录。
负载测试必须针对非生产密钥集；对生产环境进行负载测试可能会触发DDoS防护（详见pubnub-security/references/dos-mitigation.md）。
成本回归通常源于新的扇出（每个频道的订阅者增多），而非单条消息大小——请衡量正确的指标。
事件分类从四个关联字段开始；如果日志中缺少这些字段，请先修复日志记录，再继续分类处理。

MCP Tools

MCP工具

When this skill is active, prefer:

get_pubnub_usage_metrics
— pull keyset usage by transaction type for billing reconciliation and cost-spike investigation
get_pubnub_messages
— incident triage: confirm a message reached history
subscribe_and_receive_pubnub_messages
— incident triage: confirm live delivery is working
send_pubnub_message
— incident triage: synthetic publish to verify the path

激活该技能时，优先使用：

get_pubnub_usage_metrics
— 按事务类型拉取密钥集使用情况，用于账单对账和成本激增排查
get_pubnub_messages
— 事件分类：确认消息已到达历史记录
subscribe_and_receive_pubnub_messages
— 事件分类：确认实时交付正常工作
send_pubnub_message
— 事件分类：发送合成消息以验证路径是否正常

另请参阅

pubnub-reliability — observability detects the failures that reliability patterns prevent: idempotent message_id, dedup-on-merge, schema_version
pubnub-security — incident triage often touches Access Manager grants, IP allowlist, DoS, compliance reports
pubnub-keyset-management — usage metrics are per-keyset; billing reconciliation requires environment isolation
pubnub-history —
```
get_pubnub_messages
```
is the primary incident-triage data source
pubnub-presence — presence events and dropped-connection categories feed monitoring
pubnub-scale — large-event plans require pre-event capacity verification with usage metrics
pubnub-choose-docs-path — for routing other PubNub questions

pubnub-reliability — 可观测性检测到的故障可通过可靠性模式预防：幂等message_id、合并时去重、schema版本
pubnub-security — 事件分类通常涉及访问管理器权限、IP白名单、DoS防护、合规报告
pubnub-keyset-management — 使用指标按密钥集统计；账单对账需要环境隔离
pubnub-history —
```
get_pubnub_messages
```
是事件分类的主要数据源
pubnub-presence — 在线状态事件和连接断开分类为监控提供数据
pubnub-scale — 大型活动计划需要通过使用指标提前验证容量
pubnub-choose-docs-path — 用于路由其他PubNub相关问题

Output Format

输出格式

When providing implementations:

Always include the four correlation fields in any logging snippet.
Recommend a test plan that names the layer (unit / integration / load).
Quantify cost in transactions, not bytes.
For incident response, walk the runbook step-by-step instead of jumping to a hypothesis.
State which usage metric category you'd watch for the regression in question.

提供实现方案时：

在任何日志代码片段中始终包含四个关联字段。
推荐测试计划时需明确层级（单元/集成/负载）。
以事务数而非字节数量化成本。
对于事件响应，逐步执行手册流程，而非直接假设原因。
说明你将关注哪个使用指标类别来排查相关回归问题。