cnki-paper-detail
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseCNKI Paper Detail Extraction
CNKI论文详情信息提取
Extract complete metadata from a CNKI paper detail page.
从CNKI论文详情页提取完整的元数据。
Arguments
参数说明
$ARGUMENTSkcms2/article/abstract$ARGUMENTSkcms2/article/abstractSteps
操作步骤
1. Navigate to the paper page (if URL provided)
1. 导航至论文页面(若提供URL)
If contains a URL:
$ARGUMENTS- Use with the URL.
mcp__chrome-devtools__navigate_page - Use with text
mcp__chrome-devtools__wait_forand timeout 15000.["摘要"]
如果包含URL:
$ARGUMENTS- 使用工具访问该URL。
mcp__chrome-devtools__navigate_page - 使用工具,等待文本
mcp__chrome-devtools__wait_for加载,超时时间15000毫秒。["摘要"]
2. Check for captcha
2. 检查验证码
Use . If "拖动下方拼图完成验证" found, notify user:
mcp__chrome-devtools__take_snapshotCNKI 正在显示滑块验证码。请在 Chrome 浏览器中手动完成拼图验证,完成后告诉我继续。
使用工具。如果检测到“拖动下方拼图完成验证”,则通知用户:
mcp__chrome-devtools__take_snapshotCNKI 正在显示滑块验证码。请在 Chrome 浏览器中手动完成拼图验证,完成后告诉我继续。
3. Extract paper metadata via JavaScript
3. 通过JavaScript提取论文元数据
Use with this function:
mcp__chrome-devtools__evaluate_scriptjavascript
() => {
const brief = document.querySelector('.brief');
if (!brief) return { error: 'Paper detail section (.brief) not found' };
// Title
const title = brief.querySelector('h1')?.innerText?.trim()
?.replace(/\s*附视频\s*$/, '') // remove "附视频" suffix
?.replace(/\s*网络首发\s*$/, ''); // remove "网络首发" suffix
// Authors - first h3.author contains author links with sup tags
const authorH3s = brief.querySelectorAll('h3.author');
const authorSection = authorH3s[0];
const authors = [];
if (authorSection) {
const authorLinks = authorSection.querySelectorAll('a');
authorLinks.forEach(a => {
const name = a.innerText?.replace(/\d+$/, '').trim();
const supMatch = a.innerText?.match(/(\d+)$/);
const affiliationNum = supMatch ? supMatch[1] : '';
authors.push({ name, affiliationNum });
});
}
// Affiliations - second h3.author contains org links
const affiliations = [];
if (authorH3s.length > 1) {
const orgLinks = authorH3s[1].querySelectorAll('a');
orgLinks.forEach(a => {
affiliations.push(a.innerText?.trim());
});
}
// Abstract
const abstractEl = document.querySelector('.abstract-text');
const abstract = abstractEl?.innerText?.trim() || '';
// Keywords
const keywordsP = document.querySelector('p.keywords');
const keywords = keywordsP
? Array.from(keywordsP.querySelectorAll('a')).map(a => a.innerText?.replace(/;$/, '').trim())
: [];
// Fund
const fundsP = document.querySelector('p.funds');
const fund = fundsP?.innerText?.trim() || '';
// Classification code
const clcCode = document.querySelector('.clc-code');
const classification = clcCode?.innerText?.trim() || '';
// Journal/source
const docTop = document.querySelector('.doc-top');
const journal = docTop?.querySelector('a')?.innerText?.trim() || '';
// Online first / publication info
const headTime = document.querySelector('.head-time');
const pubInfo = headTime?.innerText?.trim() || '';
// Is online first?
const isOnlineFirst = !!brief.querySelector('.icon-shoufa');
// Article outline/TOC
const catalogList = document.querySelector('.catalog-list, .catalog-listDiv');
const toc = catalogList?.innerText?.trim() || '';
// Citation network counts
const citationTabs = document.querySelectorAll('ul.module-tab.tpl_lieteratures li');
const citationInfo = {};
citationTabs.forEach(li => {
const id = li.getAttribute('data-id');
const text = li.innerText?.trim();
const countMatch = text.match(/(\d+)/);
if (id) {
citationInfo[id] = {
label: text.replace(/\d+/, '').trim(),
count: countMatch ? parseInt(countMatch[1]) : 0
};
}
});
return {
title,
authors,
affiliations,
abstract,
keywords,
fund,
classification,
journal,
pubInfo,
isOnlineFirst,
toc,
citationInfo
};
}使用工具执行以下函数:
mcp__chrome-devtools__evaluate_scriptjavascript
() => {
const brief = document.querySelector('.brief');
if (!brief) return { error: 'Paper detail section (.brief) not found' };
// Title
const title = brief.querySelector('h1')?.innerText?.trim()
?.replace(/\s*附视频\s*$/, '') // remove "附视频" suffix
?.replace(/\s*网络首发\s*$/, ''); // remove "网络首发" suffix
// Authors - first h3.author contains author links with sup tags
const authorH3s = brief.querySelectorAll('h3.author');
const authorSection = authorH3s[0];
const authors = [];
if (authorSection) {
const authorLinks = authorSection.querySelectorAll('a');
authorLinks.forEach(a => {
const name = a.innerText?.replace(/\d+$/, '').trim();
const supMatch = a.innerText?.match(/(\d+)$/);
const affiliationNum = supMatch ? supMatch[1] : '';
authors.push({ name, affiliationNum });
});
}
// Affiliations - second h3.author contains org links
const affiliations = [];
if (authorH3s.length > 1) {
const orgLinks = authorH3s[1].querySelectorAll('a');
orgLinks.forEach(a => {
affiliations.push(a.innerText?.trim());
});
}
// Abstract
const abstractEl = document.querySelector('.abstract-text');
const abstract = abstractEl?.innerText?.trim() || '';
// Keywords
const keywordsP = document.querySelector('p.keywords');
const keywords = keywordsP
? Array.from(keywordsP.querySelectorAll('a')).map(a => a.innerText?.replace(/;$/, '').trim())
: [];
// Fund
const fundsP = document.querySelector('p.funds');
const fund = fundsP?.innerText?.trim() || '';
// Classification code
const clcCode = document.querySelector('.clc-code');
const classification = clcCode?.innerText?.trim() || '';
// Journal/source
const docTop = document.querySelector('.doc-top');
const journal = docTop?.querySelector('a')?.innerText?.trim() || '';
// Online first / publication info
const headTime = document.querySelector('.head-time');
const pubInfo = headTime?.innerText?.trim() || '';
// Is online first?
const isOnlineFirst = !!brief.querySelector('.icon-shoufa');
// Article outline/TOC
const catalogList = document.querySelector('.catalog-list, .catalog-listDiv');
const toc = catalogList?.innerText?.trim() || '';
// Citation network counts
const citationTabs = document.querySelectorAll('ul.module-tab.tpl_lieteratures li');
const citationInfo = {};
citationTabs.forEach(li => {
const id = li.getAttribute('data-id');
const text = li.innerText?.trim();
const countMatch = text.match(/(\d+)/);
if (id) {
citationInfo[id] = {
label: text.replace(/\d+/, '').trim(),
count: countMatch ? parseInt(countMatch[1]) : 0
};
}
});
return {
title,
authors,
affiliations,
abstract,
keywords,
fund,
classification,
journal,
pubInfo,
isOnlineFirst,
toc,
citationInfo
};
}4. Format and present the output
4. 格式化并展示输出
undefinedundefined{title} {isOnlineFirst ? "[网络首发]" : ""}
{title} {isOnlineFirst ? "[网络首发]" : ""}
Authors:
{For each author: "- {name} ({affiliation})"}
Affiliations:
{For each affiliation: "- {affiliation}"}
Journal: {journal}
Publication Info: {pubInfo}
Abstract:
{abstract}
Keywords: {keywords joined by ", "}
Fund: {fund}
Classification: {classification}
Citation Network:
{For each citation type: "- {label}: {count}"}
undefined作者:
{遍历每位作者:"- {name} ({affiliation})"}
所属机构:
{遍历每个机构:"- {affiliation}"}
期刊: {journal}
出版信息: {pubInfo}
摘要:
{abstract}
关键词: {keywords 用英文逗号连接}
基金项目: {fund}
分类号: {classification}
引用网络:
{遍历每种引用类型:"- {label}: {count}"}
undefined5. Fallback: snapshot-based parsing
5. 备选方案:基于快照的解析
If JS extraction fails, use and parse the accessibility tree:
mcp__chrome-devtools__take_snapshot- Title: level 1 element
heading - Authors: elements whose URLs contain
linkkcms2/author/detail - Affiliations: elements whose URLs contain
linkkcms2/organ/detail - Abstract: following "摘要:"
StaticText - Keywords: elements whose URLs contain
linkkcms2/keyword/detail - Fund: elements following "基金资助:"
link - Classification: following "分类号:"
StaticText
如果JS提取失败,使用工具并解析无障碍树:
mcp__chrome-devtools__take_snapshot- 标题:1级元素
heading - 作者:URL包含的
kcms2/author/detail元素link - 所属机构:URL包含的
kcms2/organ/detail元素link - 摘要:“摘要:”后的元素
StaticText - 关键词:URL包含的
kcms2/keyword/detail元素link - 基金项目:“基金资助:”后的元素
link - 分类号:“分类号:”后的元素
StaticText
Verified DOM Selectors
已验证的DOM选择器
| Data | Selector | Notes |
|---|---|---|
| Paper section | | Main paper info container |
| Title | | May contain icons, clean text needed |
| Authors | | Text has superscript numbers (e.g., "张三1") |
| Affiliations | | Text starts with "N." (e.g., "1.北京大学") |
| Abstract | | Full abstract text |
| Keywords | | Semicolon-separated keyword links |
| Fund | | Fund information text |
| Classification | | CLC classification codes |
| Journal | | Source journal link |
| Online first | | Present if paper is online first |
| Citation tabs | | data-id attr identifies type |
| 数据项 | 选择器 | 说明 |
|---|---|---|
| 论文信息容器 | | 论文核心信息的容器 |
| 标题 | | 可能包含图标,需清理文本 |
| 作者 | | 文本包含上标数字(例如:"张三1") |
| 所属机构 | | 文本以“N.”开头(例如:"1.北京大学") |
| 摘要 | | 完整的摘要文本 |
| 关键词 | | 以分号分隔的关键词链接 |
| 基金项目 | | 基金信息文本 |
| 分类号 | | 中图法分类号 |
| 期刊 | | 来源期刊链接 |
| 网络首发标识 | | 存在则表示该论文为网络首发 |
| 引用标签 | | data-id属性标识引用类型 |