gcp-logs-monitoring

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Goal

目标

Inspect Cloud Logging and Cloud Monitoring data quickly and repeatably from the terminal.
从终端快速且可重复地检查Cloud Logging和Cloud Monitoring数据。

Inputs to collect (ask only if missing)

需要收集的输入信息(仅在缺失时询问)

  • project_id
  • time_window
    for investigation (for example: last
    30m
    ,
    2h
    , or explicit UTC
    start/end
    )
  • service/resource context
    (
    cloud_run_revision
    ,
    k8s_container
    ,
    gce_instance
    , load balancer, etc.)
  • signals of interest
    (errors, latency, CPU, memory, request count, restarts)
  • output_format
    (
    table
    for quick scans,
    json
    for deeper analysis)
  • project_id
  • 调查的
    time_window
    (例如:最近
    30m
    2h
    ,或明确的UTC
    start/end
    时间)
  • 服务/资源上下文
    cloud_run_revision
    k8s_container
    gce_instance
    、负载均衡器等)
  • 关注的信号
    (错误、延迟、CPU、内存、请求数、重启次数)
  • 输出格式
    table
    用于快速扫描,
    json
    用于深度分析)

Execution workflow

执行流程

  1. Validate prerequisites:
    bash .agents/skills/gcp-logs-monitoring/scripts/check_prereqs.sh --project <project_id>
  2. Query Cloud Logging:
    bash .agents/skills/gcp-logs-monitoring/scripts/read_logs.sh --project <project_id> --filter '<LOG_FILTER>' --freshness 1h --limit 100 --format json
  3. Query Cloud Monitoring time series:
    bash .agents/skills/gcp-logs-monitoring/scripts/read_metrics.sh --project <project_id> --filter '<METRIC_FILTER>' --start <UTC_ISO8601> --end <UTC_ISO8601> --format json
  4. Correlate timestamps between logs and metrics, then summarize likely root cause and next checks.
  1. 验证前置条件:
    bash .agents/skills/gcp-logs-monitoring/scripts/check_prereqs.sh --project <project_id>
  2. 查询Cloud Logging:
    bash .agents/skills/gcp-logs-monitoring/scripts/read_logs.sh --project <project_id> --filter '<LOG_FILTER>' --freshness 1h --limit 100 --format json
  3. 查询Cloud Monitoring时间序列:
    bash .agents/skills/gcp-logs-monitoring/scripts/read_metrics.sh --project <project_id> --filter '<METRIC_FILTER>' --start <UTC_ISO8601> --end <UTC_ISO8601> --format json
  4. 关联日志和指标之间的时间戳,然后总结可能的根本原因和后续检查建议。

Common filter templates

常用过滤模板

Cloud Logging

Cloud Logging

  • Cloud Run errors:
    resource.type="cloud_run_revision" severity>=ERROR
  • GKE container errors:
    resource.type="k8s_container" severity>=ERROR
  • HTTP 5xx in load balancer logs:
    resource.type="http_load_balancer" jsonPayload.statusDetails=~"5.."
  • Timeout text search:
    textPayload:"timeout" OR jsonPayload.message:"timeout"
  • Cloud Run错误:
    resource.type="cloud_run_revision" severity>=ERROR
  • GKE容器错误:
    resource.type="k8s_container" severity>=ERROR
  • 负载均衡器日志中的HTTP 5xx错误:
    resource.type="http_load_balancer" jsonPayload.statusDetails=~"5.."
  • 超时文本搜索:
    textPayload:"timeout" OR jsonPayload.message:"timeout"

Cloud Monitoring

Cloud Monitoring

  • Cloud Run request count:
    metric.type="run.googleapis.com/request_count" AND resource.type="cloud_run_revision"
  • Cloud Run request latencies:
    metric.type="run.googleapis.com/request_latencies" AND resource.type="cloud_run_revision"
  • VM CPU utilization:
    metric.type="compute.googleapis.com/instance/cpu/utilization" AND resource.type="gce_instance"
  • Cloud Run请求数:
    metric.type="run.googleapis.com/request_count" AND resource.type="cloud_run_revision"
  • Cloud Run请求延迟:
    metric.type="run.googleapis.com/request_latencies" AND resource.type="cloud_run_revision"
  • VM CPU利用率:
    metric.type="compute.googleapis.com/instance/cpu/utilization" AND resource.type="gce_instance"

Guardrails

注意事项

  • Prefer passing
    --project
    on every command instead of changing global gcloud config.
  • Start with short windows (
    15m
    to
    2h
    ) and widen only when needed.
  • Use
    --format json
    when output will be parsed or compared across sources.
  • If auth or project checks fail, fix environment first and then re-run queries.
  • 优先在每个命令中传递
    --project
    参数,而非修改全局gcloud配置。
  • 从较短的时间窗口开始(
    15m
    2h
    ),仅在需要时扩大范围。
  • 当输出需要被解析或跨源比较时,使用
    --format json
  • 如果认证或项目检查失败,先修复环境问题,再重新运行查询。