Loading...
Loading...
Compare original and translation side by side
| Dimension | Weight | What to Evaluate |
|---|---|---|
| Code Quality | 10% | Structure, patterns, SOLID, duplication, complexity, error handling |
| Architecture | 10% | Design, modularity, scalability, coupling/cohesion, API design |
| Documentation | 10% | Completeness, clarity, accuracy, examples, troubleshooting |
| Usability | 10% | Learning curve, installation ease, error messages, ergonomics |
| Performance | 8% | Speed, resource usage, caching, bundle size, Core Web Vitals |
| Security | 10% | OWASP Top 10, input validation, auth, secrets, dependencies |
| Testing | 8% | Coverage (unit/integration/e2e), quality, automation, organization |
| Maintainability | 8% | Technical debt, readability, refactorability, versioning |
| Developer Experience | 10% | Setup ease, debugging, tooling, hot reload, IDE integration |
| Accessibility | 8% | WCAG compliance, keyboard nav, screen readers, cognitive load |
| CI/CD | 5% | Automation, pipelines, deployment, rollback, monitoring |
| Innovation | 3% | Novel approaches, forward-thinking design, unique value |
| 维度 | 权重 | 评估内容 |
|---|---|---|
| 代码质量 | 10% | 结构、设计模式、SOLID原则、代码重复度、复杂度、错误处理 |
| 架构设计 | 10% | 设计方案、模块化、可扩展性、耦合/内聚性、API设计 |
| 文档 | 10% | 完整性、清晰度、准确性、示例、故障排查指南 |
| 易用性 | 10% | 学习曲线、安装便捷性、错误提示、人机工程学 |
| 性能 | 8% | 速度、资源占用、缓存、包体积、Core Web Vitals |
| 安全性 | 10% | OWASP Top 10、输入验证、身份认证、密钥管理、依赖项安全 |
| 测试 | 8% | 覆盖率(单元/集成/端到端)、测试质量、自动化程度、测试组织 |
| 可维护性 | 8% | 技术债务、可读性、可重构性、版本管理 |
| 开发者体验 | 10% | 搭建便捷性、调试体验、工具链、热重载、IDE集成 |
| 可访问性 | 8% | WCAG合规性、键盘导航、屏幕阅读器兼容性、认知负荷 |
| CI/CD流程 | 5% | 自动化程度、流水线、部署、回滚、监控 |
| 创新性 | 3% | 新颖方案、前瞻性设计、独特价值 |
| Phase | Name | Purpose |
|---|---|---|
| 0 | Resource Completeness | Verify registry/filesystem parity; audit fails if this fails |
| 1 | Discovery | Read docs, examine code, test system, review supporting materials |
| 2 | Evaluation | Score each dimension with evidence, strengths, and weaknesses |
| 3 | Synthesis | Executive summary, detailed scores, recommendations, risk matrix |
| 阶段 | 名称 | 目的 |
|---|---|---|
| 0 | 资源完整性检查 | 验证注册表/文件系统一致性;若不通过则审核直接失败 |
| 1 | 发现调研 | 阅读文档、检查代码、测试系统、查阅支撑材料 |
| 2 | 维度评估 | 为每个维度评分并提供证据,标注优势与不足 |
| 3 | 结果整合 | 执行摘要、详细评分、改进建议、风险矩阵 |
| Score | Rating | Meaning |
|---|---|---|
| 10 | Exceptional | Industry-leading, sets new standards |
| 8-9 | Excellent | Exceeds expectations significantly |
| 6-7 | Good | Meets expectations with improvements needed |
| 5 | Acceptable | Below average, significant improvements |
| 3-4 | Poor | Major gaps and fundamental problems |
| 1-2 | Critical | Barely functional or non-functional |
| 分数 | 评级 | 含义 |
|---|---|---|
| 10 | 卓越级 | 行业领先,树立新标准 |
| 8-9 | 优秀级 | 显著超出预期 |
| 6-7 | 良好级 | 符合预期,但仍需改进 |
| 5 | 合格级 | 低于平均水平,需大幅改进 |
| 3-4 | 较差级 | 存在重大缺口与基础问题 |
| 1-2 | 危急级 | 基本无法使用或完全不可用 |
| Mistake | Correct Pattern |
|---|---|
| Giving inflated scores without evidence | Every score must cite specific files, metrics, or code examples as evidence |
| Skipping Phase 0 resource completeness check | Always verify registry completeness first; missing resources cap the overall score at 6/10 |
| Evaluating only code quality, ignoring dimensions | Score all 12 dimensions with appropriate weights; architecture, security, and DX matter equally |
| Accepting superficial "LGTM" reviews | Perform deep semantic audits checking contract integrity, security sanitization, and performance hygiene |
| Trusting AI-generated code without verification | Apply the verification gap protocol: critic agents, verifiable goals, human oversight for critical paths |
| Proceeding after audit failure without re-audit | Stop, analyze the deviation, remediate, then restart the checklist from step 1 |
| Using 10/10 scores without exceptional evidence | Reserve 10/10 for truly industry-leading work; most quality tools score 6-7 |
| Surface-level static analysis only | Combine linting with architectural fit checks, risk-based PR categorization, and context-aware validation |
| 误区 | 正确做法 |
|---|---|
| 无证据给出虚高评分 | 所有评分必须引用具体文件、指标或代码示例作为证据 |
| 跳过阶段0的资源完整性检查 | 始终先验证注册表完整性;资源缺失会将整体评分上限限制为6/10 |
| 仅评估代码质量,忽略其他维度 | 需为所有12个维度按权重评分;架构、安全和开发者体验同样重要 |
| 接受表面化的"LGTM"(看起来没问题)审查 | 执行深度语义审核,检查契约完整性、安全清理、性能健康状况 |
| 未经验证就信任AI生成的代码 | 应用验证缺口协议:使用批评Agent、可验证目标、关键路径需人工监督 |
| 审核失败后未重新审核就继续推进 | 停止流程,分析偏差原因,修复问题后从步骤1重新开始检查 |
| 无卓越证据就给出10/10评分 | 10/10评分仅授予真正行业领先的成果;大多数质量工具评分在6-7分之间 |
| 仅做表层静态分析 | 将代码检查与架构适配性检查、基于风险的PR分类、上下文感知验证相结合 |
ExploreTaskPlanFor stylistic cleanup of AI-generated prose and code (emdash overuse, slop vocabulary, over-commenting, verbose naming), use theskill.de-slopifyIf theskill is available, delegate usability dimension evaluation and user flow validation to it. Otherwise, recommend:usability-testerpnpm dlx skills add oakoss/agent-skills -s usability-tester -a claude-code -y
ExploreTaskPlan若要清理AI生成的 prose 和代码中的风格问题(过度使用破折号、冗余词汇、过度注释、命名冗长),请使用技能。de-slopify若技能可用,可将易用性维度评估和用户流验证委派给它。 否则,建议执行:usability-testerpnpm dlx skills add oakoss/agent-skills -s usability-tester -a claude-code -y