agentcore-investigation
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAgentCore Runtime Session Investigation
AgentCore运行时会话调查
Investigate AgentCore runtime sessions by querying CloudWatch Logs Insights, filtering OpenTelemetry noise, and producing structured investigation output.
Key capabilities:
- Session-to-trace resolution via OTEL span correlation
- Structured and glob-style parse queries for both dedicated and combined log groups
- OpenTelemetry noise filtering with AgentCore-specific heuristics
- Timeline construction with T+offset format
- Error, tool invocation, token usage, and latency analysis
通过查询CloudWatch Logs Insights、过滤OpenTelemetry噪音并生成结构化调查输出,来调查AgentCore运行时会话。
核心功能:
- 通过OTEL跨度关联实现会话到追踪的解析
- 针对专用和组合日志组的结构化及通配符式解析查询
- 基于AgentCore特定规则的OpenTelemetry噪音过滤
- 采用T+偏移格式构建时间线
- 错误、工具调用、令牌使用及延迟分析
Reference Files
参考文件
Load these files as needed for detailed guidance:
根据需要加载以下文件获取详细指导:
MCP:
MCP:
mcp-setup.md
mcp-setup.md
When: ALWAYS load before starting an investigation — ensures CloudWatch and Application Signals MCP servers are configured
Contains: MCP server configuration for CloudWatch Logs and Application Signals, with setup instructions for Claude Code, Gemini, Codex, and Kiro CLI
适用场景: 开始调查前必须加载——确保CloudWatch和Application Signals MCP服务器已配置
包含内容: CloudWatch日志和Application Signals的MCP服务器配置,以及Claude Code、Gemini、Codex和Kiro CLI的设置说明
.mcp.json
.mcp.json
When: Load when setting up MCP servers for the first time
Contains: Sample MCP configuration with both CloudWatch and Application Signals servers
适用场景: 首次设置MCP服务器时加载
包含内容: 包含CloudWatch和Application Signals服务器的示例MCP配置
otel-span-schema.md
otel-span-schema.md
When: ALWAYS load before querying or filtering OTEL spans
Contains: Field extraction priorities, known instrumentation scopes, noise filtering heuristics (DROP/KEEP patterns)
适用场景: 查询或过滤OTEL跨度前必须加载
包含内容: 字段提取优先级、已知工具范围、噪音过滤规则(DROP/KEEP模式)
Phase 0: SessionId-to-TraceId Resolution
阶段0:SessionId到TraceId的解析
When the user provides a sessionId, resolve it to traceId(s) first. If user provides traceId directly, skip this phase.
当用户提供sessionId时,先将其解析为traceId。如果用户直接提供traceId,则跳过此阶段。
Discovery Query (structured fields)
发现查询(结构化字段)
fields traceId, @timestamp
| filter attributes.session.id = "SESSION_ID"
| stats count(*) as spanCount, min(@timestamp) as firstSeen, max(@timestamp) as lastSeen by traceId
| sort firstSeen ascfields traceId, @timestamp
| filter attributes.session.id = "SESSION_ID"
| stats count(*) as spanCount, min(@timestamp) as firstSeen, max(@timestamp) as lastSeen by traceId
| sort firstSeen ascDiscovery Query (combined log group — glob-style parse)
发现查询(组合日志组——通配符式解析)
fields @timestamp, @message
| parse @message '"traceId":"*"' as traceId
| parse @message '"session.id":"*"' as sessionId
| filter sessionId = "SESSION_ID" or @message like "SESSION_ID"
| stats earliest(@timestamp) as firstSeen, latest(@timestamp) as lastSeen, count(*) as spanCount by traceId
| sort firstSeen asc
| limit 50fields @timestamp, @message
| parse @message '"traceId":"*"' as traceId
| parse @message '"session.id":"*"' as sessionId
| filter sessionId = "SESSION_ID" or @message like "SESSION_ID"
| stats earliest(@timestamp) as firstSeen, latest(@timestamp) as lastSeen, count(*) as spanCount by traceId
| sort firstSeen asc
| limit 50Latest Interaction Only
仅获取最新交互
fields traceId
| filter attributes.session.id = "SESSION_ID"
| sort @timestamp desc
| limit 1Store discovered traceId(s) and use them in ALL subsequent queries.
fields traceId
| filter attributes.session.id = "SESSION_ID"
| sort @timestamp desc
| limit 1存储发现的traceId,并在所有后续查询中使用。
Phase 1: Discover Log Groups
阶段1:发现日志组
Use with logGroupNamePrefix to find all runtime log groups.
describe_log_groups/aws/bedrock-agentcore/runtimesLog group naming patterns (in priority order):
- /aws/bedrock-agentcore/runtimes/<agent_id>-<endpoint_name>/otel-rt-logs (structured OTEL spans)
- /aws/bedrock-agentcore/runtimes/<agent_id>-<endpoint_name>/[runtime-logs] (stdout/stderr)
- /aws/bedrock-agentcore/runtimes/<agent_id>-<endpoint_name>-DEFAULT (single combined group)使用并设置logGroupNamePrefix ,查找所有运行时日志组。
describe_log_groups/aws/bedrock-agentcore/runtimes日志组命名模式(优先级排序):
- /aws/bedrock-agentcore/runtimes/<agent_id>-<endpoint_name>/otel-rt-logs(结构化OTEL跨度)
- /aws/bedrock-agentcore/runtimes/<agent_id>-<endpoint_name>/[runtime-logs](标准输出/标准错误)
- /aws/bedrock-agentcore/runtimes/<agent_id>-<endpoint_name>-DEFAULT(单一组合日志组)Log Group Layouts
日志组布局
AgentCore runtimes always emit OTEL spans. Some deployments split logs into a dedicated sub-group; others write everything into a single combined log group. Both are normal.
otel-rt-logs| Log Group Layout | Query Strategy |
|---|---|
Dedicated | Use structured field queries ( |
| Single combined log group | Try structured fields first — if they return 0 results, use glob-style |
If a dedicated group exists, prefer it for structured queries.
otel-rt-logsAgentCore运行时始终会输出OTEL跨度。部分部署会将日志拆分到专用的子组;其他部署则将所有内容写入单一组合日志组。这两种情况均属正常。
otel-rt-logs| 日志组布局 | 查询策略 |
|---|---|
存在专用 | 使用结构化字段查询( |
| 单一组合日志组 | 优先尝试结构化字段——如果返回0条结果,则使用通配符式 |
如果存在专用的组,优先使用它进行结构化查询。
otel-rt-logsParse Syntax Guidance
解析语法指南
When using on combined log groups, prefer glob-style parse — it is simpler and avoids escaping issues:
parse @message| parse @message '"name":"*"' as spanName
| parse @message '"traceId":"*"' as traceId
| parse @message '"startTimeUnixNano":"*"' as startNanoRegex parse () is valid CloudWatch Logs Insights syntax but requires careful escaping of quotes and special characters inside JSON. If glob-style parse extracts the field you need, use it.
/pattern/在组合日志组上使用时,优先选择通配符式解析——它更简单且避免转义问题:
parse @message| parse @message '"name":"*"' as spanName
| parse @message '"traceId":"*"' as traceId
| parse @message '"startTimeUnixNano":"*"' as startNano正则解析()是有效的CloudWatch Logs Insights语法,但需要仔细转义JSON中的引号和特殊字符。如果通配符式解析能提取所需字段,则使用该方式。
/pattern/Phase 2: Query CloudWatch Logs Insights
阶段2:查询CloudWatch Logs Insights
Run all 6 query types for a complete investigation. Each query has a structured version (for dedicated ) and a glob-style parse version (for combined log groups).
otel-rt-logs运行全部6种查询类型以完成完整调查。每种查询都有结构化版本(适用于专用)和通配符式解析版本(适用于组合日志组)。
otel-rt-logsQuery Size Limits
查询大小限制
Every query MUST include to prevent context window overflow:
| limit- Session overview:
| limit 50 - Span details:
| limit 100 - Errors:
| limit 50 - Tool invocations:
| limit 100 - Token usage:
| limit 50 - Latency outliers:
| limit 20
每个查询必须包含以防止上下文窗口溢出:
| limit- 会话概览:
| limit 50 - 跨度详情:
| limit 100 - 错误:
| limit 50 - 工具调用:
| limit 100 - 令牌使用:
| limit 50 - 延迟异常值:
| limit 20
Query 1: Session Overview
查询1:会话概览
Structured:
fields @timestamp, traceId, spanId, parentSpanId, name, scope.name,
attributes.session.id, attributes.gen_ai.operation.name, attributes.gen_ai.agent.name,
startTimeUnixNano, endTimeUnixNano
| filter traceId = "TRACE_ID"
| sort startTimeUnixNano asc
| limit 50Combined log group:
fields @timestamp, @message
| filter @message like "TRACE_ID"
| parse @message '"name":"*"' as spanName
| parse @message '"traceId":"*"' as traceId
| parse @message '"spanId":"*"' as spanId
| parse @message '"startTimeUnixNano":"*"' as startNano
| parse @message '"endTimeUnixNano":"*"' as endNano
| sort @timestamp asc
| limit 50结构化:
fields @timestamp, traceId, spanId, parentSpanId, name, scope.name,
attributes.session.id, attributes.gen_ai.operation.name, attributes.gen_ai.agent.name,
startTimeUnixNano, endTimeUnixNano
| filter traceId = "TRACE_ID"
| sort startTimeUnixNano asc
| limit 50组合日志组:
fields @timestamp, @message
| filter @message like "TRACE_ID"
| parse @message '"name":"*"' as spanName
| parse @message '"traceId":"*"' as traceId
| parse @message '"spanId":"*"' as spanId
| parse @message '"startTimeUnixNano":"*"' as startNano
| parse @message '"endTimeUnixNano":"*"' as endNano
| sort @timestamp asc
| limit 50Query 2: Span Details with Duration
查询2:带时长的跨度详情
Structured:
fields @timestamp, traceId, spanId, parentSpanId, name, scope.name,
startTimeUnixNano, endTimeUnixNano,
(endTimeUnixNano - startTimeUnixNano) / 1000000 as durationMs,
status.code, attributes.gen_ai.operation.name
| filter traceId = "TRACE_ID"
| filter ispresent(startTimeUnixNano)
| sort startTimeUnixNano asc
| limit 100Combined log group:
fields @timestamp, @message
| filter @message like "TRACE_ID"
| parse @message '"name":"*"' as spanName
| parse @message '"spanId":"*"' as spanId
| parse @message '"parentSpanId":"*"' as parentSpanId
| parse @message '"startTimeUnixNano":"*"' as startNano
| parse @message '"endTimeUnixNano":"*"' as endNano
| parse @message '"statusCode":"*"' as statusCode
| sort @timestamp asc
| limit 100结构化:
fields @timestamp, traceId, spanId, parentSpanId, name, scope.name,
startTimeUnixNano, endTimeUnixNano,
(endTimeUnixNano - startTimeUnixNano) / 1000000 as durationMs,
status.code, attributes.gen_ai.operation.name
| filter traceId = "TRACE_ID"
| filter ispresent(startTimeUnixNano)
| sort startTimeUnixNano asc
| limit 100组合日志组:
fields @timestamp, @message
| filter @message like "TRACE_ID"
| parse @message '"name":"*"' as spanName
| parse @message '"spanId":"*"' as spanId
| parse @message '"parentSpanId":"*"' as parentSpanId
| parse @message '"startTimeUnixNano":"*"' as startNano
| parse @message '"endTimeUnixNano":"*"' as endNano
| parse @message '"statusCode":"*"' as statusCode
| sort @timestamp asc
| limit 100Query 3: Errors
查询3:错误
Structured:
fields @timestamp, traceId, spanId, name, status.code, status.message,
attributes.error.message, attributes.exception.message, attributes.exception.type
| filter traceId = "TRACE_ID"
| filter status.code = 2 OR ispresent(attributes.error.message) OR ispresent(attributes.exception.message)
| sort @timestamp asc
| limit 50Combined log group:
fields @timestamp, @message
| filter @message like "TRACE_ID"
| filter @message like /ERROR|exception|Exception|fault|STATUS_CODE_ERROR/
| parse @message '"name":"*"' as spanName
| parse @message '"statusCode":"*"' as statusCode
| parse @message '"startTimeUnixNano":"*"' as startNano
| sort @timestamp asc
| limit 50结构化:
fields @timestamp, traceId, spanId, name, status.code, status.message,
attributes.error.message, attributes.exception.message, attributes.exception.type
| filter traceId = "TRACE_ID"
| filter status.code = 2 OR ispresent(attributes.error.message) OR ispresent(attributes.exception.message)
| sort @timestamp asc
| limit 50组合日志组:
fields @timestamp, @message
| filter @message like "TRACE_ID"
| filter @message like /ERROR|exception|Exception|fault|STATUS_CODE_ERROR/
| parse @message '"name":"*"' as spanName
| parse @message '"statusCode":"*"' as statusCode
| parse @message '"startTimeUnixNano":"*"' as startNano
| sort @timestamp asc
| limit 50Query 4: Tool Invocations
查询4:工具调用
Structured:
fields @timestamp, traceId, spanId, name, scope.name,
attributes.gen_ai.operation.name, attributes.tool.name,
startTimeUnixNano, endTimeUnixNano,
(endTimeUnixNano - startTimeUnixNano) / 1000000 as durationMs
| filter traceId = "TRACE_ID"
| filter attributes.gen_ai.operation.name = "execute_tool" OR ispresent(attributes.tool.name) OR name like /tool/
| sort startTimeUnixNano asc
| limit 100Combined log group:
fields @timestamp, @message
| filter @message like "TRACE_ID"
| filter @message like /tool|execute_tool|function_call/
| parse @message '"name":"*"' as spanName
| parse @message '"startTimeUnixNano":"*"' as startNano
| parse @message '"endTimeUnixNano":"*"' as endNano
| parse @message '"statusCode":"*"' as statusCode
| sort @timestamp asc
| limit 100结构化:
fields @timestamp, traceId, spanId, name, scope.name,
attributes.gen_ai.operation.name, attributes.tool.name,
startTimeUnixNano, endTimeUnixNano,
(endTimeUnixNano - startTimeUnixNano) / 1000000 as durationMs
| filter traceId = "TRACE_ID"
| filter attributes.gen_ai.operation.name = "execute_tool" OR ispresent(attributes.tool.name) OR name like /tool/
| sort startTimeUnixNano asc
| limit 100组合日志组:
fields @timestamp, @message
| filter @message like "TRACE_ID"
| filter @message like /tool|execute_tool|function_call/
| parse @message '"name":"*"' as spanName
| parse @message '"startTimeUnixNano":"*"' as startNano
| parse @message '"endTimeUnixNano":"*"' as endNano
| parse @message '"statusCode":"*"' as statusCode
| sort @timestamp asc
| limit 100Query 5: Token Usage
查询5:令牌使用
Structured:
fields @timestamp, traceId, spanId, name,
attributes.gen_ai.usage.input_tokens, attributes.gen_ai.usage.output_tokens,
attributes.gen_ai.usage.total_tokens, attributes.gen_ai.agent.name
| filter traceId = "TRACE_ID"
| filter ispresent(attributes.gen_ai.usage.total_tokens)
| sort @timestamp asc
| limit 50Combined log group:
fields @timestamp, @message
| filter @message like "TRACE_ID"
| filter @message like /input_tokens|output_tokens|usage/
| parse @message '"name":"*"' as spanName
| parse @message '"gen_ai.usage.input_tokens"' as hasTokens
| sort @timestamp asc
| limit 50结构化:
fields @timestamp, traceId, spanId, name,
attributes.gen_ai.usage.input_tokens, attributes.gen_ai.usage.output_tokens,
attributes.gen_ai.usage.total_tokens, attributes.gen_ai.agent.name
| filter traceId = "TRACE_ID"
| filter ispresent(attributes.gen_ai.usage.total_tokens)
| sort @timestamp asc
| limit 50组合日志组:
fields @timestamp, @message
| filter @message like "TRACE_ID"
| filter @message like /input_tokens|output_tokens|usage/
| parse @message '"name":"*"' as spanName
| parse @message '"gen_ai.usage.input_tokens"' as hasTokens
| sort @timestamp asc
| limit 50Query 6: Latency Outliers
查询6:延迟异常值
Structured:
fields @timestamp, traceId, spanId, name,
(endTimeUnixNano - startTimeUnixNano) / 1000000 as durationMs
| filter traceId = "TRACE_ID"
| filter ispresent(endTimeUnixNano)
| sort durationMs desc
| limit 20Combined log group:
fields @timestamp, @message
| filter @message like "TRACE_ID"
| parse @message '"name":"*"' as spanName
| parse @message '"startTimeUnixNano":"*"' as startNano
| parse @message '"endTimeUnixNano":"*"' as endNano
| sort @timestamp asc
| limit 50Queries are async — use to poll until status is .
get_logs_insight_query_resultsComplete结构化:
fields @timestamp, traceId, spanId, name,
(endTimeUnixNano - startTimeUnixNano) / 1000000 as durationMs
| filter traceId = "TRACE_ID"
| filter ispresent(endTimeUnixNano)
| sort durationMs desc
| limit 20组合日志组:
fields @timestamp, @message
| filter @message like "TRACE_ID"
| parse @message '"name":"*"' as spanName
| parse @message '"startTimeUnixNano":"*"' as startNano
| parse @message '"endTimeUnixNano":"*"' as endNano
| sort @timestamp asc
| limit 50查询为异步执行——使用轮询直至状态变为。
get_logs_insight_query_resultsCompletePhase 3: Filter OTEL Noise
阶段3:过滤OTEL噪音
See otel-span-schema.md for extraction rules, known scopes, and DROP/KEEP heuristics.
After retrieving query results:
- Count total results received
- Remove entries matching DROP patterns (count removed)
- Keep entries matching KEEP patterns
- Log: "Filtered: {total} → {kept} spans ({removed} noise entries dropped)"
请参阅otel-span-schema.md获取提取规则、已知范围及DROP/KEEP规则。
获取查询结果后:
- 统计收到的总结果数
- 删除匹配DROP模式的条目(统计删除数量)
- 保留匹配KEEP模式的条目
- 记录:"已过滤:{total} → {kept} 个跨度(已丢弃 {removed} 个噪音条目)"
Phase 4: Build Timeline
阶段4:构建时间线
Compute relative offsets from the earliest span's :
startTimeUnixNano[T+0ms] Session started — traceId: abc123
[T+45ms] LLM inference — model: anthropic.claude-v3 — 1,200ms
[T+1,250ms] Tool call: search_documents — 340ms
[T+1,600ms] Tool result: 3 documents found
[T+1,650ms] LLM inference — model: anthropic.claude-v3 — 890ms
[T+2,550ms] Response generated — 200 OK
[T+2,600ms] Session ended — total: 2,600ms根据最早跨度的计算相对偏移:
startTimeUnixNano[T+0ms] 会话启动 — traceId: abc123
[T+45ms] LLM推理 — 模型: anthropic.claude-v3 — 1,200ms
[T+1,250ms] 工具调用: search_documents — 340ms
[T+1,600ms] 工具结果: 找到3份文档
[T+1,650ms] LLM推理 — 模型: anthropic.claude-v3 — 890ms
[T+2,550ms] 生成响应 — 200 OK
[T+2,600ms] 会话结束 — 总时长: 2,600msError Handling
错误处理
| Situation | Action |
|---|---|
| No log groups found | Ask user for log group name or AWS region |
| Query returns 0 results | Widen time range to ±24h, retry. If still empty, try alternate ID fields |
| Session ID not found | Try filtering by requestId, invocationId, traceId variants |
| Query timeout | Use |
| Partial results | Note in output, suggest narrower time window |
| Structured field queries return 0 results | Switch to glob-style |
| 场景 | 操作 |
|---|---|
| 未找到日志组 | 询问用户日志组名称或AWS区域 |
| 查询返回0条结果 | 将时间范围扩大至±24小时,重试。如果仍为空,尝试其他ID字段 |
| 未找到Session ID | 尝试按requestId、invocationId、traceId变体进行过滤 |
| 查询超时 | 使用 |
| 部分结果 | 在输出中注明,建议使用更窄的时间窗口 |
| 结构化字段查询返回0条结果 | 切换为通配符式 |