agentcore-investigation

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

AgentCore Runtime Session Investigation

AgentCore运行时会话调查

Investigate AgentCore runtime sessions by querying CloudWatch Logs Insights, filtering OpenTelemetry noise, and producing structured investigation output.
Key capabilities:
  • Session-to-trace resolution via OTEL span correlation
  • Structured and glob-style parse queries for both dedicated and combined log groups
  • OpenTelemetry noise filtering with AgentCore-specific heuristics
  • Timeline construction with T+offset format
  • Error, tool invocation, token usage, and latency analysis

通过查询CloudWatch Logs Insights、过滤OpenTelemetry噪音并生成结构化调查输出,来调查AgentCore运行时会话。
核心功能:
  • 通过OTEL跨度关联实现会话到追踪的解析
  • 针对专用和组合日志组的结构化及通配符式解析查询
  • 基于AgentCore特定规则的OpenTelemetry噪音过滤
  • 采用T+偏移格式构建时间线
  • 错误、工具调用、令牌使用及延迟分析

Reference Files

参考文件

Load these files as needed for detailed guidance:
根据需要加载以下文件获取详细指导:

MCP:

MCP:

mcp-setup.md

mcp-setup.md

When: ALWAYS load before starting an investigation — ensures CloudWatch and Application Signals MCP servers are configured Contains: MCP server configuration for CloudWatch Logs and Application Signals, with setup instructions for Claude Code, Gemini, Codex, and Kiro CLI
适用场景: 开始调查前必须加载——确保CloudWatch和Application Signals MCP服务器已配置 包含内容: CloudWatch日志和Application Signals的MCP服务器配置,以及Claude Code、Gemini、Codex和Kiro CLI的设置说明

.mcp.json

.mcp.json

When: Load when setting up MCP servers for the first time Contains: Sample MCP configuration with both CloudWatch and Application Signals servers
适用场景: 首次设置MCP服务器时加载 包含内容: 包含CloudWatch和Application Signals服务器的示例MCP配置

otel-span-schema.md

otel-span-schema.md

When: ALWAYS load before querying or filtering OTEL spans Contains: Field extraction priorities, known instrumentation scopes, noise filtering heuristics (DROP/KEEP patterns)

适用场景: 查询或过滤OTEL跨度前必须加载 包含内容: 字段提取优先级、已知工具范围、噪音过滤规则(DROP/KEEP模式)

Phase 0: SessionId-to-TraceId Resolution

阶段0:SessionId到TraceId的解析

When the user provides a sessionId, resolve it to traceId(s) first. If user provides traceId directly, skip this phase.
当用户提供sessionId时,先将其解析为traceId。如果用户直接提供traceId,则跳过此阶段。

Discovery Query (structured fields)

发现查询(结构化字段)

fields traceId, @timestamp
| filter attributes.session.id = "SESSION_ID"
| stats count(*) as spanCount, min(@timestamp) as firstSeen, max(@timestamp) as lastSeen by traceId
| sort firstSeen asc
fields traceId, @timestamp
| filter attributes.session.id = "SESSION_ID"
| stats count(*) as spanCount, min(@timestamp) as firstSeen, max(@timestamp) as lastSeen by traceId
| sort firstSeen asc

Discovery Query (combined log group — glob-style parse)

发现查询(组合日志组——通配符式解析)

fields @timestamp, @message
| parse @message '"traceId":"*"' as traceId
| parse @message '"session.id":"*"' as sessionId
| filter sessionId = "SESSION_ID" or @message like "SESSION_ID"
| stats earliest(@timestamp) as firstSeen, latest(@timestamp) as lastSeen, count(*) as spanCount by traceId
| sort firstSeen asc
| limit 50
fields @timestamp, @message
| parse @message '"traceId":"*"' as traceId
| parse @message '"session.id":"*"' as sessionId
| filter sessionId = "SESSION_ID" or @message like "SESSION_ID"
| stats earliest(@timestamp) as firstSeen, latest(@timestamp) as lastSeen, count(*) as spanCount by traceId
| sort firstSeen asc
| limit 50

Latest Interaction Only

仅获取最新交互

fields traceId
| filter attributes.session.id = "SESSION_ID"
| sort @timestamp desc
| limit 1
Store discovered traceId(s) and use them in ALL subsequent queries.
fields traceId
| filter attributes.session.id = "SESSION_ID"
| sort @timestamp desc
| limit 1
存储发现的traceId,并在所有后续查询中使用。

Phase 1: Discover Log Groups

阶段1:发现日志组

Use
describe_log_groups
with logGroupNamePrefix
/aws/bedrock-agentcore/runtimes
to find all runtime log groups.
Log group naming patterns (in priority order):
- /aws/bedrock-agentcore/runtimes/<agent_id>-<endpoint_name>/otel-rt-logs (structured OTEL spans)
- /aws/bedrock-agentcore/runtimes/<agent_id>-<endpoint_name>/[runtime-logs] (stdout/stderr)
- /aws/bedrock-agentcore/runtimes/<agent_id>-<endpoint_name>-DEFAULT (single combined group)
使用
describe_log_groups
并设置logGroupNamePrefix
/aws/bedrock-agentcore/runtimes
,查找所有运行时日志组。
日志组命名模式(优先级排序):
- /aws/bedrock-agentcore/runtimes/<agent_id>-<endpoint_name>/otel-rt-logs(结构化OTEL跨度)
- /aws/bedrock-agentcore/runtimes/<agent_id>-<endpoint_name>/[runtime-logs](标准输出/标准错误)
- /aws/bedrock-agentcore/runtimes/<agent_id>-<endpoint_name>-DEFAULT(单一组合日志组)

Log Group Layouts

日志组布局

AgentCore runtimes always emit OTEL spans. Some deployments split logs into a dedicated
otel-rt-logs
sub-group; others write everything into a single combined log group. Both are normal.
Log Group LayoutQuery Strategy
Dedicated
otel-rt-logs
exists
Use structured field queries (
traceId
,
attributes.session.id
, etc.)
Single combined log groupTry structured fields first — if they return 0 results, use glob-style
parse @message
If a dedicated
otel-rt-logs
group exists, prefer it for structured queries.
AgentCore运行时始终会输出OTEL跨度。部分部署会将日志拆分到专用的
otel-rt-logs
子组;其他部署则将所有内容写入单一组合日志组。这两种情况均属正常。
日志组布局查询策略
存在专用
otel-rt-logs
使用结构化字段查询(
traceId
attributes.session.id
等)
单一组合日志组优先尝试结构化字段——如果返回0条结果,则使用通配符式
parse @message
如果存在专用的
otel-rt-logs
组,优先使用它进行结构化查询。

Parse Syntax Guidance

解析语法指南

When using
parse @message
on combined log groups, prefer glob-style parse — it is simpler and avoids escaping issues:
| parse @message '"name":"*"' as spanName
| parse @message '"traceId":"*"' as traceId
| parse @message '"startTimeUnixNano":"*"' as startNano
Regex parse (
/pattern/
) is valid CloudWatch Logs Insights syntax but requires careful escaping of quotes and special characters inside JSON. If glob-style parse extracts the field you need, use it.
在组合日志组上使用
parse @message
时,优先选择通配符式解析——它更简单且避免转义问题:
| parse @message '"name":"*"' as spanName
| parse @message '"traceId":"*"' as traceId
| parse @message '"startTimeUnixNano":"*"' as startNano
正则解析(
/pattern/
)是有效的CloudWatch Logs Insights语法,但需要仔细转义JSON中的引号和特殊字符。如果通配符式解析能提取所需字段,则使用该方式。

Phase 2: Query CloudWatch Logs Insights

阶段2:查询CloudWatch Logs Insights

Run all 6 query types for a complete investigation. Each query has a structured version (for dedicated
otel-rt-logs
) and a glob-style parse version (for combined log groups).
运行全部6种查询类型以完成完整调查。每种查询都有结构化版本(适用于专用
otel-rt-logs
)和通配符式解析版本(适用于组合日志组)。

Query Size Limits

查询大小限制

Every query MUST include
| limit
to prevent context window overflow:
  • Session overview:
    | limit 50
  • Span details:
    | limit 100
  • Errors:
    | limit 50
  • Tool invocations:
    | limit 100
  • Token usage:
    | limit 50
  • Latency outliers:
    | limit 20
每个查询必须包含
| limit
以防止上下文窗口溢出:
  • 会话概览:
    | limit 50
  • 跨度详情:
    | limit 100
  • 错误:
    | limit 50
  • 工具调用:
    | limit 100
  • 令牌使用:
    | limit 50
  • 延迟异常值:
    | limit 20

Query 1: Session Overview

查询1:会话概览

Structured:
fields @timestamp, traceId, spanId, parentSpanId, name, scope.name,
       attributes.session.id, attributes.gen_ai.operation.name, attributes.gen_ai.agent.name,
       startTimeUnixNano, endTimeUnixNano
| filter traceId = "TRACE_ID"
| sort startTimeUnixNano asc
| limit 50
Combined log group:
fields @timestamp, @message
| filter @message like "TRACE_ID"
| parse @message '"name":"*"' as spanName
| parse @message '"traceId":"*"' as traceId
| parse @message '"spanId":"*"' as spanId
| parse @message '"startTimeUnixNano":"*"' as startNano
| parse @message '"endTimeUnixNano":"*"' as endNano
| sort @timestamp asc
| limit 50
结构化:
fields @timestamp, traceId, spanId, parentSpanId, name, scope.name,
       attributes.session.id, attributes.gen_ai.operation.name, attributes.gen_ai.agent.name,
       startTimeUnixNano, endTimeUnixNano
| filter traceId = "TRACE_ID"
| sort startTimeUnixNano asc
| limit 50
组合日志组:
fields @timestamp, @message
| filter @message like "TRACE_ID"
| parse @message '"name":"*"' as spanName
| parse @message '"traceId":"*"' as traceId
| parse @message '"spanId":"*"' as spanId
| parse @message '"startTimeUnixNano":"*"' as startNano
| parse @message '"endTimeUnixNano":"*"' as endNano
| sort @timestamp asc
| limit 50

Query 2: Span Details with Duration

查询2:带时长的跨度详情

Structured:
fields @timestamp, traceId, spanId, parentSpanId, name, scope.name,
       startTimeUnixNano, endTimeUnixNano,
       (endTimeUnixNano - startTimeUnixNano) / 1000000 as durationMs,
       status.code, attributes.gen_ai.operation.name
| filter traceId = "TRACE_ID"
| filter ispresent(startTimeUnixNano)
| sort startTimeUnixNano asc
| limit 100
Combined log group:
fields @timestamp, @message
| filter @message like "TRACE_ID"
| parse @message '"name":"*"' as spanName
| parse @message '"spanId":"*"' as spanId
| parse @message '"parentSpanId":"*"' as parentSpanId
| parse @message '"startTimeUnixNano":"*"' as startNano
| parse @message '"endTimeUnixNano":"*"' as endNano
| parse @message '"statusCode":"*"' as statusCode
| sort @timestamp asc
| limit 100
结构化:
fields @timestamp, traceId, spanId, parentSpanId, name, scope.name,
       startTimeUnixNano, endTimeUnixNano,
       (endTimeUnixNano - startTimeUnixNano) / 1000000 as durationMs,
       status.code, attributes.gen_ai.operation.name
| filter traceId = "TRACE_ID"
| filter ispresent(startTimeUnixNano)
| sort startTimeUnixNano asc
| limit 100
组合日志组:
fields @timestamp, @message
| filter @message like "TRACE_ID"
| parse @message '"name":"*"' as spanName
| parse @message '"spanId":"*"' as spanId
| parse @message '"parentSpanId":"*"' as parentSpanId
| parse @message '"startTimeUnixNano":"*"' as startNano
| parse @message '"endTimeUnixNano":"*"' as endNano
| parse @message '"statusCode":"*"' as statusCode
| sort @timestamp asc
| limit 100

Query 3: Errors

查询3:错误

Structured:
fields @timestamp, traceId, spanId, name, status.code, status.message,
       attributes.error.message, attributes.exception.message, attributes.exception.type
| filter traceId = "TRACE_ID"
| filter status.code = 2 OR ispresent(attributes.error.message) OR ispresent(attributes.exception.message)
| sort @timestamp asc
| limit 50
Combined log group:
fields @timestamp, @message
| filter @message like "TRACE_ID"
| filter @message like /ERROR|exception|Exception|fault|STATUS_CODE_ERROR/
| parse @message '"name":"*"' as spanName
| parse @message '"statusCode":"*"' as statusCode
| parse @message '"startTimeUnixNano":"*"' as startNano
| sort @timestamp asc
| limit 50
结构化:
fields @timestamp, traceId, spanId, name, status.code, status.message,
       attributes.error.message, attributes.exception.message, attributes.exception.type
| filter traceId = "TRACE_ID"
| filter status.code = 2 OR ispresent(attributes.error.message) OR ispresent(attributes.exception.message)
| sort @timestamp asc
| limit 50
组合日志组:
fields @timestamp, @message
| filter @message like "TRACE_ID"
| filter @message like /ERROR|exception|Exception|fault|STATUS_CODE_ERROR/
| parse @message '"name":"*"' as spanName
| parse @message '"statusCode":"*"' as statusCode
| parse @message '"startTimeUnixNano":"*"' as startNano
| sort @timestamp asc
| limit 50

Query 4: Tool Invocations

查询4:工具调用

Structured:
fields @timestamp, traceId, spanId, name, scope.name,
       attributes.gen_ai.operation.name, attributes.tool.name,
       startTimeUnixNano, endTimeUnixNano,
       (endTimeUnixNano - startTimeUnixNano) / 1000000 as durationMs
| filter traceId = "TRACE_ID"
| filter attributes.gen_ai.operation.name = "execute_tool" OR ispresent(attributes.tool.name) OR name like /tool/
| sort startTimeUnixNano asc
| limit 100
Combined log group:
fields @timestamp, @message
| filter @message like "TRACE_ID"
| filter @message like /tool|execute_tool|function_call/
| parse @message '"name":"*"' as spanName
| parse @message '"startTimeUnixNano":"*"' as startNano
| parse @message '"endTimeUnixNano":"*"' as endNano
| parse @message '"statusCode":"*"' as statusCode
| sort @timestamp asc
| limit 100
结构化:
fields @timestamp, traceId, spanId, name, scope.name,
       attributes.gen_ai.operation.name, attributes.tool.name,
       startTimeUnixNano, endTimeUnixNano,
       (endTimeUnixNano - startTimeUnixNano) / 1000000 as durationMs
| filter traceId = "TRACE_ID"
| filter attributes.gen_ai.operation.name = "execute_tool" OR ispresent(attributes.tool.name) OR name like /tool/
| sort startTimeUnixNano asc
| limit 100
组合日志组:
fields @timestamp, @message
| filter @message like "TRACE_ID"
| filter @message like /tool|execute_tool|function_call/
| parse @message '"name":"*"' as spanName
| parse @message '"startTimeUnixNano":"*"' as startNano
| parse @message '"endTimeUnixNano":"*"' as endNano
| parse @message '"statusCode":"*"' as statusCode
| sort @timestamp asc
| limit 100

Query 5: Token Usage

查询5:令牌使用

Structured:
fields @timestamp, traceId, spanId, name,
       attributes.gen_ai.usage.input_tokens, attributes.gen_ai.usage.output_tokens,
       attributes.gen_ai.usage.total_tokens, attributes.gen_ai.agent.name
| filter traceId = "TRACE_ID"
| filter ispresent(attributes.gen_ai.usage.total_tokens)
| sort @timestamp asc
| limit 50
Combined log group:
fields @timestamp, @message
| filter @message like "TRACE_ID"
| filter @message like /input_tokens|output_tokens|usage/
| parse @message '"name":"*"' as spanName
| parse @message '"gen_ai.usage.input_tokens"' as hasTokens
| sort @timestamp asc
| limit 50
结构化:
fields @timestamp, traceId, spanId, name,
       attributes.gen_ai.usage.input_tokens, attributes.gen_ai.usage.output_tokens,
       attributes.gen_ai.usage.total_tokens, attributes.gen_ai.agent.name
| filter traceId = "TRACE_ID"
| filter ispresent(attributes.gen_ai.usage.total_tokens)
| sort @timestamp asc
| limit 50
组合日志组:
fields @timestamp, @message
| filter @message like "TRACE_ID"
| filter @message like /input_tokens|output_tokens|usage/
| parse @message '"name":"*"' as spanName
| parse @message '"gen_ai.usage.input_tokens"' as hasTokens
| sort @timestamp asc
| limit 50

Query 6: Latency Outliers

查询6:延迟异常值

Structured:
fields @timestamp, traceId, spanId, name,
       (endTimeUnixNano - startTimeUnixNano) / 1000000 as durationMs
| filter traceId = "TRACE_ID"
| filter ispresent(endTimeUnixNano)
| sort durationMs desc
| limit 20
Combined log group:
fields @timestamp, @message
| filter @message like "TRACE_ID"
| parse @message '"name":"*"' as spanName
| parse @message '"startTimeUnixNano":"*"' as startNano
| parse @message '"endTimeUnixNano":"*"' as endNano
| sort @timestamp asc
| limit 50
Queries are async — use
get_logs_insight_query_results
to poll until status is
Complete
.
结构化:
fields @timestamp, traceId, spanId, name,
       (endTimeUnixNano - startTimeUnixNano) / 1000000 as durationMs
| filter traceId = "TRACE_ID"
| filter ispresent(endTimeUnixNano)
| sort durationMs desc
| limit 20
组合日志组:
fields @timestamp, @message
| filter @message like "TRACE_ID"
| parse @message '"name":"*"' as spanName
| parse @message '"startTimeUnixNano":"*"' as startNano
| parse @message '"endTimeUnixNano":"*"' as endNano
| sort @timestamp asc
| limit 50
查询为异步执行——使用
get_logs_insight_query_results
轮询直至状态变为
Complete

Phase 3: Filter OTEL Noise

阶段3:过滤OTEL噪音

See otel-span-schema.md for extraction rules, known scopes, and DROP/KEEP heuristics.
After retrieving query results:
  1. Count total results received
  2. Remove entries matching DROP patterns (count removed)
  3. Keep entries matching KEEP patterns
  4. Log: "Filtered: {total} → {kept} spans ({removed} noise entries dropped)"
请参阅otel-span-schema.md获取提取规则、已知范围及DROP/KEEP规则。
获取查询结果后:
  1. 统计收到的总结果数
  2. 删除匹配DROP模式的条目(统计删除数量)
  3. 保留匹配KEEP模式的条目
  4. 记录:"已过滤:{total} → {kept} 个跨度(已丢弃 {removed} 个噪音条目)"

Phase 4: Build Timeline

阶段4:构建时间线

Compute relative offsets from the earliest span's
startTimeUnixNano
:
[T+0ms]     Session started — traceId: abc123
[T+45ms]    LLM inference — model: anthropic.claude-v3 — 1,200ms
[T+1,250ms] Tool call: search_documents — 340ms
[T+1,600ms] Tool result: 3 documents found
[T+1,650ms] LLM inference — model: anthropic.claude-v3 — 890ms
[T+2,550ms] Response generated — 200 OK
[T+2,600ms] Session ended — total: 2,600ms
根据最早跨度的
startTimeUnixNano
计算相对偏移:
[T+0ms]     会话启动 — traceId: abc123
[T+45ms]    LLM推理 — 模型: anthropic.claude-v3 — 1,200ms
[T+1,250ms] 工具调用: search_documents — 340ms
[T+1,600ms] 工具结果: 找到3份文档
[T+1,650ms] LLM推理 — 模型: anthropic.claude-v3 — 890ms
[T+2,550ms] 生成响应 — 200 OK
[T+2,600ms] 会话结束 — 总时长: 2,600ms

Error Handling

错误处理

SituationAction
No log groups foundAsk user for log group name or AWS region
Query returns 0 resultsWiden time range to ±24h, retry. If still empty, try alternate ID fields
Session ID not foundTry filtering by requestId, invocationId, traceId variants
Query timeoutUse
cancel_logs_insight_query
, reduce time range, retry
Partial resultsNote in output, suggest narrower time window
Structured field queries return 0 resultsSwitch to glob-style
parse @message
queries (see Parse Syntax Guidance)
场景操作
未找到日志组询问用户日志组名称或AWS区域
查询返回0条结果将时间范围扩大至±24小时,重试。如果仍为空,尝试其他ID字段
未找到Session ID尝试按requestId、invocationId、traceId变体进行过滤
查询超时使用
cancel_logs_insight_query
,缩小时间范围,重试
部分结果在输出中注明,建议使用更窄的时间窗口
结构化字段查询返回0条结果切换为通配符式
parse @message
查询(请参阅解析语法指南)