mteb-leaderboard

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

MTEB Leaderboard Query Skill

MTEB排行榜查询技能

This skill provides guidance for accurately querying machine learning model leaderboards and benchmarks, particularly the Massive Text Embedding Benchmark (MTEB) and related embedding leaderboards.

本技能为准确查询机器学习模型排行榜与基准测试提供指导，尤其针对Massive Text Embedding Benchmark（MTEB）及相关嵌入排行榜。

When to Use This Skill

何时使用本技能

Finding top-performing models on specific benchmarks (MTEB, Scandinavian Embedding Benchmark, etc.)
Answering questions about current leaderboard standings
Comparing model performance across different benchmarks
Tasks with specific temporal requirements (e.g., "as of August 2025")

查找特定基准（MTEB、斯堪的纳维亚嵌入基准等）上的顶级模型
解答关于当前排行榜排名的问题
跨不同基准比较模型性能
带有特定时间要求的任务（例如：“截至2025年8月”）

Core Approach

核心方法

Step 1: Identify Authoritative Data Sources

步骤1：确定权威数据源

Before searching for results, establish which sources contain authoritative, current data:

Primary Sources (prefer these):
- Official leaderboard websites (e.g.,
```
mteb-leaderboard
```
  on HuggingFace Spaces)
- GitHub repositories with raw benchmark data
- API endpoints or JSON data files from leaderboard maintainers
Secondary Sources (use with caution):
- Academic papers (often outdated by publication time)
- Blog posts and articles (may reference outdated results)
- News articles about benchmark results

在搜索结果前，先确认哪些来源包含权威、最新的数据：

主要来源（优先选择）：
- 官方排行榜网站（例如HuggingFace Spaces上的
```
mteb-leaderboard
```
  ）
- 包含原始基准数据的GitHub仓库
- 排行榜维护者提供的API端点或JSON数据文件
次要来源（谨慎使用）：
- 学术论文（通常发表时数据已过时）
- 博客文章与专栏（可能引用过时结果）
- 关于基准测试结果的新闻报道

Step 2: Verify Temporal Alignment

步骤2：验证时间一致性

When a task specifies a time constraint (e.g., "as of August 2025"):

Check source publication/update dates - Academic papers are typically 6-18 months behind current leaderboard state
Look for "last updated" timestamps on leaderboard pages
Never assume paper results reflect current standings without verification
Be explicit about temporal gaps - If using data from June 2024 to answer about August 2025, this is a 14+ month gap that likely invalidates the data

当任务指定时间限制（例如：“截至2025年8月”）时：

检查来源的发布/更新日期 - 学术论文的数据通常比当前排行榜状态滞后6-18个月
查看排行榜页面上的“最后更新”时间戳
绝不假设论文结果能反映当前排名，必须验证
明确说明时间差距 - 若用2024年6月的数据解答2025年8月的问题，这14个月以上的差距会导致数据失效

Step 3: Access Live Leaderboard Data

步骤3：获取实时排行榜数据

When web pages don't render properly (interactive charts, JavaScript-heavy pages):

Look for raw data endpoints:
- Check for
```
/api/
```
  or
```
/data/
```
  endpoints
- Search for JSON files in the page source
- Look for GitHub repositories backing the leaderboard
Try alternative access methods:
- HuggingFace Spaces often have Gradio APIs
- Many leaderboards publish CSV/JSON exports
- Check GitHub issues/discussions for data access tips

Search for data repositories:

site:github.com [leaderboard name] results json

site:huggingface.co [benchmark name] leaderboard

当网页无法正常渲染（交互式图表、重度依赖JavaScript的页面）时：

查找原始数据端点：
- 检查
```
/api/
```
  或
```
/data/
```
  端点
- 在页面源码中搜索JSON文件
- 查找支持排行榜的GitHub仓库
尝试替代访问方式：
- HuggingFace Spaces通常提供Gradio API
- 许多排行榜会发布CSV/JSON导出文件
- 查看GitHub议题/讨论获取数据访问技巧

搜索数据仓库：

site:github.com [leaderboard name] results json

site:huggingface.co [benchmark name] leaderboard

Step 4: Validate Model Eligibility

步骤4：验证模型资格

Do not make assumptions about which models "count" on a leaderboard:

Check official leaderboard criteria - Some include API models, some don't
Verify the answer format requirements against actual leaderboard entries
Do not exclude models based on assumptions about what can be represented in a given format
Consider all model types: open-source, API-based, fine-tuned variants

不要假设哪些模型能被计入排行榜：

查看官方排行榜标准 - 部分排行榜包含API模型，部分不包含
对照实际排行榜条目验证答案格式要求
不要基于对格式表现的假设排除模型
考虑所有模型类型：开源模型、基于API的模型、微调变体

Verification Strategies

验证策略

Cross-Reference Multiple Sources

多来源交叉验证

Compare results from at least 2-3 independent sources
If sources disagree, prioritize the most recent authoritative source
Document discrepancies and their potential causes

对比至少2-3个独立来源的结果
若来源存在分歧，优先选择最新的权威来源
记录差异及其潜在原因

Sanity Check Results

合理性检查

Verify the model actually appears on the leaderboard
Confirm the model name/organization format matches the source
Check if the model was released before the specified date

验证模型确实出现在排行榜上
确认模型名称/组织格式与来源一致
检查模型是否在指定日期前发布

Test Alternative Access Methods

尝试替代访问方法

When primary access fails:

Try the Wayback Machine for historical snapshots
Search for leaderboard maintainer announcements
Look for community discussions about recent changes
Check if there's a programmatic API

当主要访问方式失败时：

尝试使用Wayback Machine获取历史快照
搜索排行榜维护者的公告
查找关于近期变更的社区讨论
检查是否有程序化API

Common Pitfalls to Avoid

常见误区规避

1. Relying on Outdated Academic Papers

1. 依赖过时的学术论文

Academic papers have publication delays of 3-12 months. A paper published in June 2024 contains data from early 2024 at best. Never use paper results for questions about current standings.

学术论文存在3-12个月的发布延迟。2024年6月发表的论文最多仅包含2024年初的数据。绝不要用论文结果解答关于当前排名的问题。

2. Giving Up When Web Scraping Fails

2. 网页抓取失败就放弃

Interactive leaderboards often don't render in simple web fetches. Always try:

Looking for underlying data files
Checking GitHub repositories
Finding API endpoints
Searching for data exports

交互式排行榜通常无法通过简单网页抓取渲染。务必尝试：

查找底层数据文件
检查GitHub仓库
寻找API端点
搜索数据导出文件

3. Making Assumptions About Model Format

3. 对模型格式做出假设

Do not assume API models (OpenAI, Cohere, etc.) cannot be valid answers. Check the actual task requirements and leaderboard contents.

不要假设API模型（OpenAI、Cohere等）不能作为有效答案。请查看实际任务要求和排行榜内容。

4. Premature Conclusion Without Verification

4. 未验证就得出结论

Before writing a final answer:

Verify the model appears on the actual leaderboard
Confirm the ranking is current
Check that the model meets all task requirements

在撰写最终答案前：

验证模型确实出现在实际排行榜上
确认排名是最新的
检查模型是否满足所有任务要求

5. Ignoring Temporal Requirements

5. 忽略时间要求

If a task asks about a specific date, ensure data sources reflect that timeframe. A 14-month gap between data and required date is unacceptable.

如果任务指定了特定日期，确保数据源符合该时间范围。数据与要求日期之间存在14个月以上的差距是不可接受的。

Systematic Search Strategy

系统化搜索策略

When searching for leaderboard information:

Start broad, then narrow:

```
[benchmark name] leaderboard 2025
```
```
[benchmark name] top models current
```
```
site:huggingface.co [benchmark name]
```

Search for raw data:

```
[benchmark name] results github
```
```
[benchmark name] json data
```
```
[benchmark name] api
```

Search for recent updates:

[benchmark name] new top model [current year]

```
[benchmark name] leaderboard update
```

Avoid repetitive similar queries - If a query pattern isn't working after 2-3 attempts, change the approach rather than making minor variations

搜索排行榜信息时：

从宽泛到精准：

```
[benchmark name] leaderboard 2025
```
```
[benchmark name] top models current
```
```
site:huggingface.co [benchmark name]
```

搜索原始数据：

```
[benchmark name] results github
```
```
[benchmark name] json data
```
```
[benchmark name] api
```

搜索近期更新：

[benchmark name] new top model [current year]

```
[benchmark name] leaderboard update
```

避免重复相似查询 - 如果某种查询模式尝试2-3次后无效，应更换方法而非做微小调整