gs-export
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseGoogle Scholar Export to Zotero
将Google Scholar论文导出至Zotero
Export Google Scholar paper citation data via BibTeX extraction and push to Zotero desktop.
通过提取BibTeX格式的引用数据,将Google Scholar论文的引用信息推送至Zotero桌面端。
Arguments
参数说明
$ARGUMENTS contains one or more data-cids (space-separated), e.g.:
- — single paper
TFS2GgoGiNUJ - — batch export
TFS2GgoGiNUJ abc123XYZ def456UVW
$ARGUMENTS包含一个或多个data-cid(以空格分隔),例如:
- — 单篇论文
TFS2GgoGiNUJ - — 批量导出
TFS2GgoGiNUJ abc123XYZ def456UVW
Steps
操作步骤
Step 1: Get BibTeX for each paper
步骤1:获取每篇论文的BibTeX数据
For each data-cid, perform 3 tool calls to bypass CORS:
针对每个data-cid,执行3次工具调用以绕过CORS限制:
1a. Fetch cite dialog to get BibTeX link (evaluate_script)
1a. 获取引用对话框中的BibTeX链接(evaluate_script)
javascript
async () => {
const cid = "DATA_CID_HERE";
const resp = await fetch(
`https://scholar.google.com/scholar?q=info:${cid}:scholar.google.com/&output=cite`,
{ credentials: 'include' }
);
const html = await resp.text();
const doc = new DOMParser().parseFromString(html, 'text/html');
// Extract export links
const links = Array.from(doc.querySelectorAll('#gs_citi a')).map(a => ({
format: a.textContent.trim(),
url: a.href
}));
// Extract citation format texts
const citations = Array.from(doc.querySelectorAll('#gs_citt tr')).map(tr => {
const cells = tr.querySelectorAll('td');
return {
style: cells[0]?.textContent?.trim() || '',
text: cells[1]?.textContent?.trim() || ''
};
});
const bibtexLink = links.find(l => l.format === 'BibTeX');
return { cid, bibtexLink: bibtexLink?.url || '', links, citations };
}javascript
async () => {
const cid = "DATA_CID_HERE";
const resp = await fetch(
`https://scholar.google.com/scholar?q=info:${cid}:scholar.google.com/&output=cite`,
{ credentials: 'include' }
);
const html = await resp.text();
const doc = new DOMParser().parseFromString(html, 'text/html');
// 提取导出链接
const links = Array.from(doc.querySelectorAll('#gs_citi a')).map(a => ({
format: a.textContent.trim(),
url: a.href
}));
// 提取引用格式文本
const citations = Array.from(doc.querySelectorAll('#gs_citt tr')).map(tr => {
const cells = tr.querySelectorAll('td');
return {
style: cells[0]?.textContent?.trim() || '',
text: cells[1]?.textContent?.trim() || ''
};
});
const bibtexLink = links.find(l => l.format === 'BibTeX');
return { cid, bibtexLink: bibtexLink?.url || '', links, citations };
}1b. Navigate to BibTeX URL (navigate_page)
1b. 跳转到BibTeX链接页面(navigate_page)
Use :
mcp__chrome-devtools__navigate_page- url: the URL from step 1a (on
bibtexLink)scholar.googleusercontent.com
This bypasses CORS restrictions that block fetch() to googleusercontent.com.
使用工具:
mcp__chrome-devtools__navigate_page- url:步骤1a中获取的链接(位于
bibtexLink域名下)scholar.googleusercontent.com
此操作可绕过阻止向googleusercontent.com发起fetch()请求的CORS限制。
1c. Read BibTeX content (evaluate_script)
1c. 读取BibTeX内容(evaluate_script)
javascript
async () => {
return { bibtex: document.body.innerText || document.body.textContent || '' };
}javascript
async () => {
return { bibtex: document.body.innerText || document.body.textContent || '' };
}Step 2: Parse BibTeX and push to Zotero
步骤2:解析BibTeX并推送至Zotero
Save the BibTeX data as JSON, then call the push script:
bash
python "E:/gscholar-skills/.claude/skills/gs-export/scripts/push_to_zotero.py" /tmp/gs_papers.jsonBefore calling the script, construct a JSON file at containing paper data parsed from BibTeX. Parse the BibTeX yourself and create the JSON array:
/tmp/gs_papers.jsonjson
[
{
"pmid": "",
"title": "The title from BibTeX",
"authors": [
{"lastName": "Smith", "firstName": "John"}
],
"journal": "Journal Name",
"journalAbbr": "",
"pubdate": "2022",
"volume": "14",
"issue": "4",
"pages": "1054",
"doi": "",
"pdfUrl": "https://example.com/paper.pdf",
"abstract": "",
"keywords": [],
"language": "en",
"pubtype": ["Journal Article"]
}
]IMPORTANT: Set from the search result's field (the PDF link extracted by gs-search). The Python script will download the PDF and upload it to Zotero via (Zotero 7.x ignores attachments in saveItems). PDF download may fail for some publishers (403, JS-redirect); these are reported as "PDF skip".
pdfUrlfullTextUrl/connector/saveAttachmentBibTeX fields mapping:
- →
@article{key,itemType: journalArticle - →
@inproceedings{key,itemType: conferencePaper - →
@book{key,itemType: book - →
title={...}title - →
author={Last1, First1 and Last2, First2}arrayauthors - →
journal={...}journal - →
year={...}pubdate - →
volume={...}volume - →
number={...}issue - →
pages={...}pages - → (included in extra or publisher field)
publisher={...}
将BibTeX数据保存为JSON格式,然后调用推送脚本:
bash
python "E:/gscholar-skills/.claude/skills/gs-export/scripts/push_to_zotero.py" /tmp/gs_papers.json调用脚本前,需在路径下创建包含从BibTeX解析出的论文数据的JSON文件。自行解析BibTeX并构建JSON数组:
/tmp/gs_papers.jsonjson
[
{
"pmid": "",
"title": "The title from BibTeX",
"authors": [
{"lastName": "Smith", "firstName": "John"}
],
"journal": "Journal Name",
"journalAbbr": "",
"pubdate": "2022",
"volume": "14",
"issue": "4",
"pages": "1054",
"doi": "",
"pdfUrl": "https://example.com/paper.pdf",
"abstract": "",
"keywords": [],
"language": "en",
"pubtype": ["Journal Article"]
}
]重要提示:从搜索结果的字段(由gs-search提取的PDF链接)设置。Python脚本将下载PDF并通过上传至Zotero(Zotero 7.x会忽略saveItems中的附件)。部分出版商可能导致PDF下载失败(403错误、JS重定向),此类情况会被标记为“PDF跳过”。
fullTextUrlpdfUrl/connector/saveAttachmentBibTeX字段映射关系:
- →
@article{key,itemType: journalArticle - →
@inproceedings{key,itemType: conferencePaper - →
@book{key,itemType: book - →
title={...}title - →
author={Last1, First1 and Last2, First2}数组authors - →
journal={...}journal - →
year={...}pubdate - →
volume={...}volume - →
number={...}issue - →
pages={...}pages - →(包含在extra字段或publisher字段中)
publisher={...}
Step 3: Report
步骤3:导出报告
Single paper:
Exported to Zotero from Google Scholar:
Title: {title}
Authors: {authors}
Journal: {journal} ({year})
Data-CID: {dataCid}Batch:
Exported {count} papers to Zotero from Google Scholar:
1. {title1} ({journal1}, {year1})
2. {title2} ({journal2}, {year2})
...单篇论文导出报告:
已从Google Scholar导出至Zotero:
标题:{title}
作者:{authors}
期刊:{journal} ({year})
Data-CID:{dataCid}批量导出报告:
已从Google Scholar导出{count}篇论文至Zotero:
1. {title1} ({journal1}, {year1})
2. {title2} ({journal2}, {year2})
...Batch Export Optimization
批量导出优化方案
For multiple papers, process sequentially to avoid CAPTCHA:
- Get all BibTeX links in one evaluate_script call (fetch all cite dialogs)
- Navigate to each BibTeX URL one at a time
- Collect all BibTeX entries
- Push all to Zotero in a single batch
针对多篇论文,按顺序处理以避免触发CAPTCHA:
- 通过一次evaluate_script调用获取所有BibTeX链接(获取所有引用对话框)
- 依次跳转到每个BibTeX链接页面
- 收集所有BibTeX条目
- 一次性批量推送至Zotero
Notes
注意事项
- Single paper export uses 3-4 tool calls: (cite dialog) +
evaluate_script(BibTeX URL) +navigate_page(read BibTeX) +evaluate_script(Zotero push)bash python - Batch export: 2N+1 tool calls (N papers: N navigate + N evaluate + 1 bash)
- BibTeX links are on — CORS blocks fetch(), so we use navigate_page to bypass
scholar.googleusercontent.com - Reuses for Zotero Connector API communication
push_to_zotero.py - Google Scholar BibTeX does NOT include abstract or DOI — these fields will be empty in Zotero
- After export, navigate back to Google Scholar page: with type
navigate_pageback
- 单篇论文导出需使用3-4次工具调用:(获取引用对话框)+
evaluate_script(跳转至BibTeX链接)+navigate_page(读取BibTeX内容)+evaluate_script(推送至Zotero)bash python - 批量导出:需使用2N+1次工具调用(N篇论文:N次跳转 + N次读取 + 1次bash调用)
- BibTeX链接位于域名下——CORS会阻止fetch()请求,因此我们使用navigate_page来绕过限制
scholar.googleusercontent.com - 复用脚本与Zotero Connector API进行通信
push_to_zotero.py - Google Scholar的BibTeX不包含摘要或DOI信息——这些字段在Zotero中会为空
- 导出完成后,跳转回Google Scholar页面:调用并设置类型为
navigate_pageback