scholar-evaluation

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Scholar Evaluation

学术评估

Overview

概述

Apply the ScholarEval framework to systematically evaluate scholarly and research work. This skill provides structured evaluation methodology based on peer-reviewed research assessment criteria, enabling comprehensive analysis of academic papers, research proposals, literature reviews, and scholarly writing across multiple quality dimensions.

应用ScholarEval框架对学术与研究工作进行系统性评估。本Skill基于同行评审的研究评估标准提供结构化评估方法，能够从多个质量维度全面分析学术论文、研究提案、文献综述以及学术写作。

When to Use This Skill

何时使用本Skill

Use this skill when:

Evaluating research papers for quality and rigor
Assessing literature review comprehensiveness and quality
Reviewing research methodology design
Scoring data analysis approaches
Evaluating scholarly writing and presentation
Providing structured feedback on academic work
Benchmarking research quality against established criteria

在以下场景使用本Skill：

评估研究论文的质量与严谨性
评审文献综述的全面性与质量
审查研究方法设计
为数据分析方法打分
评估学术写作与呈现质量
为学术工作提供结构化反馈
对照既定标准对标研究质量

Evaluation Workflow

评估流程

Step 1: Initial Assessment and Scope Definition

步骤1：初始评估与范围定义

Begin by identifying the type of scholarly work being evaluated and the evaluation scope:

Work Types:

Full research paper (empirical, theoretical, or review)
Research proposal or protocol
Literature review (systematic, narrative, or scoping)
Thesis or dissertation chapter
Conference abstract or short paper

Evaluation Scope:

Comprehensive (all dimensions)
Targeted (specific aspects like methodology or writing)
Comparative (benchmarking against other work)

Ask the user to clarify if the scope is ambiguous.

首先确定待评估的学术工作类型以及评估范围：

工作类型：

完整研究论文（实证、理论或综述类）
研究提案或方案
文献综述（系统性、叙述性或范围界定类）
毕业论文或学位论文章节
会议摘要或短篇论文

评估范围：

全面评估（所有维度）
针对性评估（如方法或写作等特定方面）
对比评估（与其他工作对标）

若范围不明确，请向用户确认。

Step 2: Dimension-Based Evaluation

步骤2：多维度评估

Systematically evaluate the work across the ScholarEval dimensions. For each applicable dimension, assess quality, identify strengths and weaknesses, and provide scores where appropriate.

Refer to

references/evaluation_framework.md

for detailed criteria and rubrics for each dimension.

Core Evaluation Dimensions:

Problem Formulation & Research Questions
- Clarity and specificity of research questions
- Theoretical or practical significance
- Feasibility and scope appropriateness
- Novelty and contribution potential
Literature Review
- Comprehensiveness of coverage
- Critical synthesis vs. mere summarization
- Identification of research gaps
- Currency and relevance of sources
- Proper contextualization
Methodology & Research Design
- Appropriateness for research questions
- Rigor and validity
- Reproducibility and transparency
- Ethical considerations
- Limitations acknowledgment
Data Collection & Sources
- Quality and appropriateness of data
- Sample size and representativeness
- Data collection procedures
- Source credibility and reliability
Analysis & Interpretation
- Appropriateness of analytical methods
- Rigor of analysis
- Logical coherence
- Alternative explanations considered
- Results-claims alignment
Results & Findings
- Clarity of presentation
- Statistical or qualitative rigor
- Visualization quality
- Interpretation accuracy
- Implications discussion
Scholarly Writing & Presentation
- Clarity and organization
- Academic tone and style
- Grammar and mechanics
- Logical flow
- Accessibility to target audience
Citations & References
- Citation completeness
- Source quality and appropriateness
- Citation accuracy
- Balance of perspectives
- Adherence to citation standards

按照ScholarEval的各个维度对工作进行系统性评估。针对每个适用维度，评估质量、识别优缺点，并在合适的情况下给出分数。

如需每个维度的详细标准和评分细则，请参考

references/evaluation_framework.md

。

核心评估维度：

问题提出与研究问题
- 研究问题的清晰度与明确性
- 理论或实践意义
- 可行性与范围适宜性
- 创新性与贡献潜力
文献综述
- 覆盖的全面性
- 批判性整合 vs. 单纯总结
- 研究空白的识别
- 文献的时效性与相关性
- 背景介绍的恰当性
研究方法与设计
- 与研究问题的适配性
- 严谨性与有效性
- 可重复性与透明度
- 伦理考量
- 局限性的说明
数据收集与来源
- 数据的质量与适配性
- 样本量与代表性
- 数据收集流程
- 来源的可信度与可靠性
分析与解读
- 分析方法的适配性
- 分析的严谨性
- 逻辑连贯性
- 是否考虑了其他解释
- 结果与结论的一致性
结果与发现
- 呈现的清晰度
- 统计或定性分析的严谨性
- 可视化质量
- 解读的准确性
- 意义讨论
学术写作与呈现
- 清晰度与组织结构
- 学术语气与风格
- 语法与格式规范
- 逻辑流畅性
- 对目标受众的易读性
引用与参考文献
- 引用的完整性
- 来源的质量与适配性
- 引用的准确性
- 观点的平衡性
- 对引用规范的遵循

Step 3: Scoring and Rating

步骤3：评分与评级

For each evaluated dimension, provide:

Qualitative Assessment:

Key strengths (2-3 specific points)
Areas for improvement (2-3 specific points)
Critical issues (if any)

Quantitative Scoring (Optional): Use a 5-point scale where applicable:

5: Excellent - Exemplary quality, publishable in top venues
4: Good - Strong quality with minor improvements needed
3: Adequate - Acceptable quality with notable areas for improvement
2: Needs Improvement - Significant revisions required
1: Poor - Fundamental issues requiring major revision

To calculate aggregate scores programmatically, use

scripts/calculate_scores.py

针对每个评估维度，提供：

定性评估：

核心优势（2-3个具体要点）
改进方向（2-3个具体要点）
关键问题（如有）

定量评分（可选）： 适用时采用5分制：

5: 优秀 - 质量典范，可在顶级期刊/会议发表
4: 良好 - 质量优异，仅需小幅改进
3: 合格 - 质量可接受，但有明显改进空间
2: 待改进 - 需要重大修订
1: 较差 - 存在根本性问题，需全面修改

如需程序化计算综合得分，请使用

scripts/calculate_scores.py

。

Step 4: Synthesize Overall Assessment

步骤4：综合评估总结

Provide an integrated evaluation summary:

Overall Quality Assessment - Holistic judgment of the work's scholarly merit
Major Strengths - 3-5 key strengths across dimensions
Critical Weaknesses - 3-5 primary areas requiring attention
Priority Recommendations - Ranked list of improvements by impact
Publication Readiness (if applicable) - Assessment of suitability for target venues

提供整合后的评估总结：

整体质量评估 - 对工作学术价值的整体判断
主要优势 - 跨维度的3-5个核心优势
关键不足 - 3-5个需重点关注的主要问题
优先级建议 - 按影响程度排序的改进建议
发表准备度（如适用）- 对目标期刊/会议适配性的评估

Step 5: Provide Actionable Feedback

步骤5：提供可落地的反馈

Transform evaluation findings into constructive, actionable feedback:

Feedback Structure:

Specific - Reference exact sections, paragraphs, or page numbers
Actionable - Provide concrete suggestions for improvement
Prioritized - Rank recommendations by importance and feasibility
Balanced - Acknowledge strengths while addressing weaknesses
Evidence-based - Ground feedback in evaluation criteria

Feedback Format Options:

Structured report with dimension-by-dimension analysis
Annotated comments mapped to specific document sections
Executive summary with key findings and recommendations
Comparative analysis against benchmark standards

将评估结果转化为有建设性、可落地的反馈：

反馈结构：

具体性 - 引用确切的章节、段落或页码
可落地性 - 提供具体的改进建议
优先级 - 按重要性与可行性排序建议
平衡性 - 认可优势的同时指出不足
循证性 - 基于评估标准给出反馈

反馈格式选项：

包含各维度分析的结构化报告
映射到文档具体章节的批注式评论
包含关键发现与建议的执行摘要
与基准标准对比的分析报告

Step 6: Contextual Considerations

步骤6：场景化考量

Adjust evaluation approach based on:

Stage of Development:

Early draft: Focus on conceptual and structural issues
Advanced draft: Focus on refinement and polish
Final submission: Comprehensive quality check

Purpose and Venue:

Journal article: High standards for rigor and contribution
Conference paper: Balance novelty with presentation clarity
Student work: Educational feedback with developmental focus
Grant proposal: Emphasis on feasibility and impact

Discipline-Specific Norms:

STEM fields: Emphasis on reproducibility and statistical rigor
Social sciences: Balance quantitative and qualitative standards
Humanities: Focus on argumentation and scholarly interpretation

根据以下因素调整评估方式：

开发阶段：

初稿：聚焦概念与结构问题
进阶稿：聚焦优化与打磨
终稿：全面质量检查

目的与发表渠道：

期刊论文：严谨性与贡献度的高标准
会议论文：平衡创新性与呈现清晰度
学生作业：侧重教育性的发展反馈
基金提案：强调可行性与影响力

学科特定规范：

STEM领域：侧重可重复性与统计严谨性
社会科学：平衡定量与定性标准
人文学科：聚焦论证与学术解读

Resources

资源

references/evaluation_framework.md

Detailed evaluation criteria, rubrics, and quality indicators for each ScholarEval dimension. Load this reference when conducting evaluations to access specific assessment guidelines and scoring rubrics.

Search patterns for quick access:

"Problem Formulation criteria"
"Literature Review rubric"
"Methodology assessment"
"Data quality indicators"
"Analysis rigor standards"
"Writing quality checklist"

每个ScholarEval维度的详细评估标准、评分细则与质量指标。进行评估时加载此参考文档，可获取具体的评估指南与评分细则。

快速访问的搜索关键词：

"Problem Formulation criteria"
"Literature Review rubric"
"Methodology assessment"
"Data quality indicators"
"Analysis rigor standards"
"Writing quality checklist"

scripts/calculate_scores.py

Python script for calculating aggregate evaluation scores from dimension-level ratings. Supports weighted averaging, threshold analysis, and score visualization.

Usage:

python

python scripts/calculate_scores.py --scores <dimension_scores.json> --output <report.txt>

用于从维度评分计算综合评估得分的Python脚本。支持加权平均、阈值分析与分数可视化。

用法：

python

python scripts/calculate_scores.py --scores <dimension_scores.json> --output <report.txt>

Best Practices

最佳实践

Maintain Objectivity - Base evaluations on established criteria, not personal preferences
Be Comprehensive - Evaluate all applicable dimensions systematically
Provide Evidence - Support assessments with specific examples from the work
Stay Constructive - Frame weaknesses as opportunities for improvement
Consider Context - Adjust expectations based on work stage and purpose
Document Rationale - Explain the reasoning behind assessments and scores
Encourage Strengths - Explicitly acknowledge what the work does well
Prioritize Feedback - Focus on high-impact improvements first

保持客观性 - 基于既定标准进行评估，而非个人偏好
全面评估 - 系统性评估所有适用维度
提供证据 - 用工作中的具体实例支持评估结论
保持建设性 - 将不足转化为改进机会
考虑场景 - 根据工作阶段与目的调整期望
记录理由 - 解释评估与打分的依据
肯定优势 - 明确认可工作的亮点
优先反馈 - 聚焦高影响的改进点

Example Evaluation Workflow

评估工作流示例

User Request: "Evaluate this research paper on machine learning for drug discovery"

Response Process:

Identify work type (empirical research paper) and scope (comprehensive evaluation)
Load
```
references/evaluation_framework.md
```
for detailed criteria
Systematically assess each dimension:
- Problem formulation: Clear research question about ML model performance
- Literature review: Comprehensive coverage of recent ML and drug discovery work
- Methodology: Appropriate deep learning architecture with validation procedures
- [Continue through all dimensions...]
Calculate dimension scores and overall assessment
Synthesize findings into structured report highlighting:
- Strong methodology and reproducible code
- Needs more diverse dataset evaluation
- Writing could improve clarity in results section
Provide prioritized recommendations with specific suggestions

用户请求： "评估这篇关于机器学习药物发现的研究论文"

响应流程：

确定工作类型（实证研究论文）与评估范围（全面评估）
加载
```
references/evaluation_framework.md
```
获取详细标准
系统性评估每个维度：
- 问题提出：关于ML模型性能的研究问题清晰明确
- 文献综述：全面覆盖了近期ML与药物发现领域的研究
- 研究方法：采用了合适的深度学习架构与验证流程
- [继续完成所有维度的评估...]
计算维度得分并给出整体评估
将发现整合为结构化报告，重点突出：
- 方法严谨且代码可复现
- 需要增加数据集的多样性评估
- 结果部分的写作清晰度有待提升
提供按优先级排序的具体改进建议

Notes

注意事项

Evaluation rigor should match the work's purpose and stage
Some dimensions may not apply to all work types (e.g., data collection for purely theoretical papers)
Cultural and disciplinary differences in scholarly norms should be considered
This framework complements, not replaces, domain-specific expertise

评估的严谨性应与工作的目的和阶段匹配
部分维度可能不适用于所有工作类型（如纯理论论文无需评估数据收集）
需考虑学术规范的文化与学科差异
本框架是对领域专业知识的补充，而非替代

Citation

引用

This skill is based on the ScholarEval framework introduced in:

Moussa, H. N., Da Silva, P. Q., Adu-Ampratwum, D., East, A., Lu, Z., Puccetti, N., Xue, M., Sun, H., Majumder, B. P., & Kumar, S. (2025). ScholarEval: Research Idea Evaluation Grounded in Literature. arXiv preprint arXiv:2510.16234. https://arxiv.org/abs/2510.16234

Abstract: ScholarEval is a retrieval augmented evaluation framework that assesses research ideas based on two fundamental criteria: soundness (the empirical validity of proposed methods based on existing literature) and contribution (the degree of advancement made by the idea across different dimensions relative to prior research). The framework achieves significantly higher coverage of expert-annotated evaluation points and is consistently preferred over baseline systems in terms of evaluation actionability, depth, and evidence support.

本Skill基于ScholarEval框架，该框架出自：

摘要： ScholarEval是一个基于检索增强的评估框架，它基于两个核心标准评估研究想法：合理性（基于现有文献验证所提方法的实证有效性）与贡献度（该想法相较于先前研究在不同维度上的推进程度）。该框架覆盖的专家标注评估点显著更多，且在评估的可落地性、深度与证据支持方面，始终比基准系统更受青睐。