full-stack-debugger
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseFull Stack Debugger
Full Stack Debugger
Overview
概述
The Full Stack Debugger enables systematic debugging of issues across the entire application stack (UI/Frontend, Backend/API, Database/State). It combines browser testing, log analysis, code examination, and automated server restart/verification to iteratively identify and fix issues one at a time until the system is fully operational.
This skill uses a proven workflow: Detection → Analysis → Fix → Restart → Verification → Iteration to systematically resolve issues that developers encounter during development and testing.
Full Stack Debugger 支持对整个应用栈(UI/前端、后端/API、数据库/状态)的问题进行系统化调试。它结合浏览器测试、日志分析、代码检查以及自动服务器重启/验证功能,逐一识别并修复问题,直到系统完全恢复正常运行。
此技能采用经过验证的工作流:检测 → 分析 → 修复 → 重启 → 验证 → 迭代,系统性解决开发和测试过程中开发者遇到的问题。
When to Use This Skill
何时使用此技能
Trigger this skill when observing:
- Error states in the UI (dashboard, buttons failing, status showing errors)
- Repeated failures in backend logs (task execution failures, import errors, database errors)
- Unexpected database state (rows showing failed status when they should succeed)
- API endpoints returning errors or unexpected responses
- Services failing to initialize or process tasks
- Cascading failures across multiple components
当观察到以下情况时,触发此技能:
- UI中的错误状态(仪表板、按钮失效、状态显示错误)
- 后端日志中的重复故障(任务执行失败、导入错误、数据库错误)
- 异常的数据库状态(本应成功的记录显示失败状态)
- API端点返回错误或异常响应
- 服务无法初始化或处理任务
- 跨多个组件的级联故障
Debugging Workflow
调试工作流
Phase 1: Detection
阶段1:检测
Detect errors from multiple sources:
Browser UI Detection:
- Navigate to the affected page/feature in the browser
- Check for error messages, red warning states, or disabled functionality
- Read console error messages using DevTools
- Note the specific UI state and what action triggered the error
Backend Log Detection:
- Query recent error logs using
tail -200 /path/to/logs/errors.log - Search for error patterns related to the issue using
grep - Note error timestamps, error messages, and stack traces
- Look for repeated errors (indicates systemic issue)
Database State Detection:
- Query the database directly using sqlite3
- Check status of recent tasks, transactions, or records
- Look for failed, incomplete, or error states
- Note which records are affected and what their states are
Example: When debugging a scheduler failure:
- Navigate to System Health dashboard
- Observe scheduler showing "0 done" or "X failed"
- Check for error messages
/logs/errors.log - Query table to see failed task records
queue_tasks
从多个来源检测错误:
浏览器UI检测:
- 在浏览器中导航到受影响的页面/功能
- 检查错误消息、红色警告状态或禁用的功能
- 使用DevTools查看控制台错误消息
- 记录具体的UI状态以及触发错误的操作
后端日志检测:
- 使用 查询近期错误日志
tail -200 /path/to/logs/errors.log - 使用 搜索与问题相关的错误模式
grep - 记录错误时间戳、错误消息和堆栈跟踪
- 查找重复出现的错误(表明存在系统性问题)
数据库状态检测:
- 使用sqlite3直接查询数据库
- 检查近期任务、事务或记录的状态
- 查找失败、未完成或错误状态的记录
- 记录受影响的记录及其状态
示例:调试调度器故障时:
- 导航到系统健康仪表板
- 观察调度器显示“0完成”或“X失败”
- 查看 中的错误消息
/logs/errors.log - 查询 表查看失败的任务记录
queue_tasks
Phase 2: Analysis
阶段2:分析
Analyze root causes by reading code and logs:
Code Analysis:
- Read the error file/module indicated in error stack traces
- Check imports - look for missing statements
from X import Y - Check class names - verify instantiation matches actual class names
- Look for syntax errors - unmatched quotes, unclosed parentheses
- Check function signatures - ensure payloads match expected parameters
- Read reference documentation () for error patterns
references/common_errors.md
Log Analysis:
- Extract error messages from logs
- Look for patterns like (missing import),
'optional'(syntax error),unterminated string(wrong class name)'attribute' - Trace error propagation backward to find the originating issue
- Check timestamps - multiple errors at same time indicate batch failure
API/Payload Analysis:
- Check what payload the API is sending to task handlers
- Read the task handler code to see what fields it expects
- Compare actual payload vs expected payload
- Look for missing required fields
Example: When debugging "name 'Optional' is not defined":
- Find the file mentioned in error ()
analysis_executor.py - Read the imports section
- Notice is used but not imported
Optional - Check line 14: - missing
from typing import Dict, List, AnyOptional - Fix: Add to the import statement
Optional
通过阅读代码和日志分析根本原因:
代码分析:
- 阅读错误堆栈跟踪中指明的错误文件/模块
- 检查导入语句——查找缺失的 语句
from X import Y - 检查类名——验证实例化是否与实际类名匹配
- 查找语法错误——未闭合的引号、未结束的括号
- 检查函数签名——确保负载与预期参数匹配
- 查阅参考文档()中的错误模式
references/common_errors.md
日志分析:
- 从日志中提取错误消息
- 查找诸如 (缺失导入)、
'optional'(语法错误)、unterminated string(错误类名)等模式'attribute' - 反向追踪错误传播路径,找到问题源头
- 检查时间戳——同一时间出现多个错误表明批量故障
API/负载分析:
- 检查API发送给任务处理器的负载
- 阅读任务处理器代码,查看其预期的字段
- 比较实际负载与预期负载
- 查找缺失的必填字段
示例:调试“name 'Optional' is not defined”错误时:
- 找到错误中提及的文件()
analysis_executor.py - 阅读导入部分
- 注意到使用了 但未导入
Optional - 查看第14行:——缺失
from typing import Dict, List, AnyOptional - 修复方法:在导入语句中添加
Optional
Phase 3: Fix (One Issue at a Time)
阶段3:修复(一次解决一个问题)
Apply fixes one issue per iteration:
Before Fixing:
- Verify this is the first/next issue to fix
- Read the relevant code section carefully
- Use the fix patterns from
references/fix_templates.md
Common Fix Patterns:
- Missing imports: Add to import statement (e.g., )
from typing import Optional - Wrong class name: Update import and instantiation to match actual class
- Missing docstring quotes: Add opening to docstring
""" - Wrong payload fields: Add missing required fields to payload dictionary
- Syntax errors: Fix unmatched quotes, parentheses, brackets
After Fixing:
- Read back the changed code to verify syntax
- Check the edit was correct (line numbers, indentation)
- Only fix ONE issue, even if multiple exist - don't cascade fixes
- Document what was changed in a clear comment
Example Fix:
python
undefined每次迭代仅修复一个问题:
修复前:
- 确认这是第一个/下一个需要修复的问题
- 仔细阅读相关代码段
- 使用 中的修复模式
references/fix_templates.md
常见修复模式:
- 缺失导入: 添加到导入语句中(例如:)
from typing import Optional - 错误类名: 更新导入和实例化代码以匹配实际类名
- 缺失文档字符串引号: 为文档字符串添加开头的
""" - 错误负载字段: 向负载字典中添加缺失的必填字段
- 语法错误: 修复未闭合的引号、括号、方括号
修复后:
- 重新阅读修改后的代码以验证语法正确性
- 检查编辑是否正确(行号、缩进)
- 即使存在多个问题,也仅修复一个——不要进行级联修复
- 用清晰的注释记录所做的修改
修复示例:
python
undefinedBEFORE
BEFORE
from typing import Dict, List, Any
from typing import Dict, List, Any
AFTER
AFTER
from typing import Dict, List, Any, Optional
undefinedfrom typing import Dict, List, Any, Optional
undefinedPhase 4: Restart (Automated)
阶段4:重启(自动化)
Restart the backend server after each fix:
bash
undefined每次修复后重启后端服务器:
bash
undefinedKill existing processes
Kill existing processes
lsof -ti:8000 | xargs kill -9 2>/dev/null
lsof -ti:8000 | xargs kill -9 2>/dev/null
Clear Python bytecode cache
Clear Python bytecode cache
find . -type d -name "pycache" -exec rm -rf {} + 2>/dev/null
find . -type f -name "*.pyc" -delete 2>/dev/null
find . -type d -name "pycache" -exec rm -rf {} + 2>/dev/null
find . -type f -name "*.pyc" -delete 2>/dev/null
Restart backend
Restart backend
sleep 3 && python -m src.main --command web > /tmp/backend_restart.log 2>&1 &
sleep 10 # Wait for startup
sleep 3 && python -m src.main --command web > /tmp/backend_restart.log 2>&1 &
sleep 10 # Wait for startup
Verify health
Verify health
curl -m 5 http://localhost:8000/api/health
undefinedcurl -m 5 http://localhost:8000/api/health
undefinedPhase 5: Verification
阶段5:验证
Verify the fix worked through multiple checks:
Health Check:
- Call endpoint
/api/health - Verify
"status": "healthy" - If still failing, check logs for new errors
Browser Verification:
- Navigate to the affected UI page
- Trigger the action that previously failed
- Verify the error is gone
- Check for new errors in console
Database Verification:
- Query the affected records/tasks
- Verify status changed from failed/error to success/completed
- Check that metrics updated (e.g., scheduler shows "1 done" instead of "0 done")
Log Verification:
- Check recent logs for the same error
- Verify no new errors appeared
- Look for success messages or "completed" status
Example:
- Scheduler should show "1 done" instead of "0 done"
- Task record should show status="completed" instead of "failed"
- No error messages in logs
- WebSocket shows healthy status in UI
通过多项检查验证修复是否生效:
健康检查:
- 调用 端点
/api/health - 验证返回
"status": "healthy" - 如果仍然失败,检查日志中的新错误
浏览器验证:
- 导航到受影响的UI页面
- 触发之前失败的操作
- 验证错误已消失
- 检查控制台是否有新错误
数据库验证:
- 查询受影响的记录/任务
- 验证状态从失败/错误变为成功/完成
- 检查指标是否更新(例如:调度器显示“1完成”而非“0完成”)
日志验证:
- 检查近期日志中是否存在相同错误
- 验证未出现新错误
- 查找成功消息或“已完成”状态
示例:
- 调度器应显示“1完成”而非“0完成”
- 任务记录应显示status="completed"而非"failed"
- 日志中无错误消息
- UI中的WebSocket显示健康状态
Phase 6: Iteration
阶段6:迭代
If issues remain, repeat the cycle:
-
Continue if more issues exist:
- Check logs for remaining errors
- If yes, return to Phase 2 (Analysis)
- Fix the next issue (Phase 3)
- Restart (Phase 4)
- Verify (Phase 5)
-
Stop when all issues fixed:
- All schedulers show completed execution counts
- UI shows no error states
- Logs show no error patterns
- Tasks/records show success status
- Full verification complete
如果仍有问题,重复此循环:
-
若存在更多问题则继续:
- 检查日志中的剩余错误
- 如果有,返回阶段2(分析)
- 修复下一个问题(阶段3)
- 重启(阶段4)
- 验证(阶段5)
-
所有问题修复后停止:
- 所有调度器显示已完成的执行计数
- UI无错误状态
- 日志无错误模式
- 任务/记录显示成功状态
- 完成全面验证
Common Error Patterns
常见错误模式
See for patterns to recognize:
references/common_errors.md- Python syntax errors (unterminated strings, missing quotes)
- Import errors (,
name 'X' is not defined)cannot import name 'Y' - Class/attribute errors ()
'dict' object has no attribute 'symbol' - Type errors (passing wrong data type)
- Payload/configuration errors (missing required fields)
请查看 了解需识别的模式:
references/common_errors.md- Python语法错误(未终止的字符串、缺失引号)
- 导入错误(、
name 'X' is not defined)cannot import name 'Y' - 类/属性错误()
'dict' object has no attribute 'symbol' - 类型错误(传递错误的数据类型)
- 负载/配置错误(缺失必填字段)
Fix Templates
修复模板
See for ready-to-use fix patterns:
references/fix_templates.md- How to add missing imports
- How to fix class name mismatches
- How to fix docstring syntax
- How to add missing payload fields
- How to fix type errors
请查看 获取现成可用的修复模式:
references/fix_templates.md- 如何添加缺失的导入
- 如何修复类名不匹配问题
- 如何修复文档字符串语法
- 如何添加缺失的负载字段
- 如何修复类型错误
Tools Used
使用的工具
- Playwright Browser Tools: Navigate UI, verify changes
- Read/Grep Tools: Examine code and logs
- Bash: Server restart, cache clearing, health checks
- Edit Tool: Apply code fixes
- Database Queries: Verify task/record state
- Playwright Browser Tools: 导航UI、验证更改
- Read/Grep Tools: 检查代码和日志
- Bash: 服务器重启、缓存清理、健康检查
- Edit Tool: 应用代码修复
- Database Queries: 验证任务/记录状态
MCP Tools Integration
MCP工具集成
Use robo-trader-dev MCP tools for 95%+ token-efficient debugging:
| Task | MCP Tool | Token Savings | Usage |
|---|---|---|---|
| Analyze error logs | | 98% | Pattern detection with time windows |
| System health check | | 97% | Database, queues, API, disk status |
| Diagnose DB locks | | 95% | Correlate logs with code patterns |
| Queue monitoring | | 96% | Real-time queue backlog analysis |
| Coordinator status | | 94% | Init status, error details |
| Error pattern fix | | 90% | Known pattern matching with examples |
| Read code files | | 85% | Progressive context (summary/targeted/full) |
| Find related files | | 88% | Import/git/similarity analysis |
Example debugging workflow:
python
undefined使用robo-trader-dev MCP工具实现95%+的令牌高效调试:
| 任务 | MCP工具 | 令牌节省率 | 使用场景 |
|---|---|---|---|
| 分析错误日志 | | 98% | 带时间窗口的模式检测 |
| 系统健康检查 | | 97% | 数据库、队列、API、磁盘状态 |
| 诊断数据库锁 | | 95% | 将日志与代码模式关联 |
| 队列监控 | | 96% | 实时队列积压分析 |
| 协调器状态 | | 94% | 初始化状态、错误详情 |
| 错误模式修复 | | 90% | 已知模式匹配及示例 |
| 读取代码文件 | | 85% | 渐进式上下文(摘要/定向/完整) |
| 查找相关文件 | | 88% | 导入/git/相似度分析 |
示例调试工作流:
python
undefined1. Detect errors (MCP instead of tail/grep)
1. Detect errors (MCP instead of tail/grep)
mcp__robo-trader-dev__analyze_logs(patterns=["ERROR", "TIMEOUT"], time_window="1h")
mcp__robo-trader-dev__analyze_logs(patterns=["ERROR", "TIMEOUT"], time_window="1h")
2. Check system health (MCP instead of curl loops)
2. Check system health (MCP instead of curl loops)
mcp__robo-trader-dev__check_system_health(components=["database", "queues", "api_endpoints"])
mcp__robo-trader-dev__check_system_health(components=["database", "queues", "api_endpoints"])
3. Diagnose specific issue (MCP instead of sqlite3 + code reading)
3. Diagnose specific issue (MCP instead of sqlite3 + code reading)
mcp__robo-trader-dev__diagnose_database_locks(time_window="24h", include_code_references=True)
mcp__robo-trader-dev__diagnose_database_locks(time_window="24h", include_code_references=True)
4. Get fix suggestions (MCP instead of manual pattern matching)
4. Get fix suggestions (MCP instead of manual pattern matching)
mcp__robo-trader-dev__suggest_fix(error_message="name 'Optional' is not defined", context_file="src/services/analyzer.py")
**Integration with robo-trader architecture**:
- Queue operations: Use `queue_status` to monitor PORTFOLIO_SYNC, DATA_FETCHER, AI_ANALYSIS
- Coordinator debugging: Use `coordinator_status` for BroadcastCoordinator, AIChatCoordinator init issues
- Database access: Use `query_portfolio` or `diagnose_database_locks` instead of direct sqlite3 connectionsmcp__robo-trader-dev__suggest_fix(error_message="name 'Optional' is not defined", context_file="src/services/analyzer.py")
**与robo-trader架构的集成**:
- 队列操作:使用 `queue_status` 监控PORTFOLIO_SYNC、DATA_FETCHER、AI_ANALYSIS
- 协调器调试:使用 `coordinator_status` 排查BroadcastCoordinator、AIChatCoordinator初始化问题
- 数据库访问:使用 `query_portfolio` 或 `diagnose_database_locks` 替代直接sqlite3连接Key Principles
核心原则
- One issue at a time - Fix one problem per iteration to prevent cascading failures
- Verify immediately - Always restart and verify after each fix
- Multi-layer detection - Check UI, logs, and database for clues
- Iterative refinement - Continue until all issues resolved
- Automated restart - Always use clean restart (kill + cache clear + restart)
- Browser verification - Always test in actual UI, not just logs
- 一次解决一个问题 - 每次迭代仅修复一个问题,防止级联故障
- 立即验证 - 每次修复后务必重启并验证
- 多层检测 - 从UI、日志和数据库中查找线索
- 迭代优化 - 持续操作直到所有问题解决
- 自动化重启 - 始终使用干净重启(终止进程 + 清理缓存 + 重启)
- 浏览器验证 - 始终在实际UI中测试,而非仅依赖日志