douyin-video-extractor

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Douyin Video Extractor Skill

抖音视频提取Skill

Skill by ara.so — MCP Skills collection.
ara.so开发的Skill — MCP Skills工具集。

Overview

概述

douyin-mcp-server
extracts watermark-free videos from Douyin (Chinese TikTok) share links and uses AI to transcribe audio content into text. It supports three usage modes: WebUI, MCP server integration, and command-line interface.
Key Features:
  • Extract high-quality watermark-free video download links
  • AI-powered speech-to-text transcription using SenseVoice
  • Automatic chunking for large audio files (>1 hour or >50MB)
  • MCP integration for Claude Desktop and other AI assistants
  • Web interface for browser-based usage
douyin-mcp-server
可从抖音(中国版TikTok)分享链接中提取无水印视频,并借助AI将音频内容转写为文本。它支持三种使用模式:WebUI、MCP服务器集成和命令行界面。
核心功能:
  • 提取高质量无水印视频下载链接
  • 基于SenseVoice的AI语音转文字功能
  • 对大型音频文件(超过1小时或50MB)自动分块处理
  • 可与Claude Desktop及其他AI助手进行MCP集成
  • 支持浏览器端使用的Web界面

Installation

安装

Prerequisites

前置依赖

bash
undefined
bash
undefined

Install uv (Python package manager)

安装uv(Python包管理器)

Install FFmpeg (required for audio processing)

安装FFmpeg(音频处理必需)

macOS

macOS

brew install ffmpeg
brew install ffmpeg

Ubuntu/Debian

Ubuntu/Debian

apt install ffmpeg
apt install ffmpeg

Windows (with chocolatey)

Windows(使用chocolatey)

choco install ffmpeg
undefined
choco install ffmpeg
undefined

Setup

配置步骤

bash
undefined
bash
undefined

Clone the repository

克隆仓库

git clone https://github.com/yzfly/douyin-mcp-server.git cd douyin-mcp-server
git clone https://github.com/yzfly/douyin-mcp-server.git cd douyin-mcp-server

Install dependencies

安装依赖

uv sync
uv sync

Set API key for transcription (optional, only needed for text extraction)

设置转写功能的API密钥(可选,仅在提取文本时需要)

export API_KEY="sk-xxxxxxxxxxxxxxxx"
undefined
export API_KEY="sk-xxxxxxxxxxxxxxxx"
undefined

Usage Modes

使用模式

1. WebUI (Recommended for Interactive Use)

1. WebUI(交互式使用推荐)

bash
undefined
bash
undefined

Start the web server

启动Web服务器

uv run python web/app.py
uv run python web/app.py

Access in browser: http://localhost:8080

在浏览器中访问:http://localhost:8080


**WebUI Features:**
- Parse video info without API key
- Extract transcripts with API key (configured in browser or env var)
- Download videos directly
- Export transcripts as Markdown

**WebUI功能:**
- 无需API密钥即可解析视频信息
- 使用API密钥提取转写文本(可在浏览器或环境变量中配置)
- 直接下载视频
- 将转写文本导出为Markdown格式

2. MCP Server (For AI Assistants)

2. MCP服务器(适用于AI助手)

Configure in
claude_desktop_config.json
or similar MCP client config:
json
{
  "mcpServers": {
    "douyin-mcp": {
      "command": "uvx",
      "args": ["douyin-mcp-server"],
      "env": {
        "API_KEY": "sk-xxxxxxxxxxxxxxxx"
      }
    }
  }
}
Available MCP Tools:
  • parse_douyin_video_info
    - Parse video metadata (no API key needed)
  • get_douyin_download_link
    - Get watermark-free download URL (no API key needed)
  • extract_douyin_text
    - Extract video transcript via AI (requires API key)
claude_desktop_config.json
或类似MCP客户端配置文件中进行配置:
json
{
  "mcpServers": {
    "douyin-mcp": {
      "command": "uvx",
      "args": ["douyin-mcp-server"],
      "env": {
        "API_KEY": "sk-xxxxxxxxxxxxxxxx"
      }
    }
  }
}
可用MCP工具:
  • parse_douyin_video_info
    - 解析视频元数据(无需API密钥)
  • get_douyin_download_link
    - 获取无水印下载链接(无需API密钥)
  • extract_douyin_text
    - 通过AI提取视频转写文本(需要API密钥)

3. Command Line Interface

3. 命令行界面

bash
undefined
bash
undefined

Get video information (no API key required)

获取视频信息(无需API密钥)

uv run python douyin-video/scripts/douyin_downloader.py
-l "https://v.douyin.com/xxxxx/"
-a info
uv run python douyin-video/scripts/douyin_downloader.py
-l "https://v.douyin.com/xxxxx/"
-a info

Download watermark-free video

下载无水印视频

uv run python douyin-video/scripts/douyin_downloader.py
-l "https://v.douyin.com/xxxxx/"
-a download
-o ./videos
uv run python douyin-video/scripts/douyin_downloader.py
-l "https://v.douyin.com/xxxxx/"
-a download
-o ./videos

Extract transcript (requires API_KEY)

提取转写文本(需要API_KEY)

uv run python douyin-video/scripts/douyin_downloader.py
-l "https://v.douyin.com/xxxxx/"
-a extract
-o ./output
uv run python douyin-video/scripts/douyin_downloader.py
-l "https://v.douyin.com/xxxxx/"
-a extract
-o ./output

Extract transcript and save video

提取转写文本并保存视频

uv run python douyin-video/scripts/douyin_downloader.py
-l "https://v.douyin.com/xxxxx/"
-a extract
-o ./output
--save-video

**CLI Arguments:**
- `-l, --link` - Douyin share link (required)
- `-a, --action` - Action: `info`, `download`, or `extract` (required)
- `-o, --output` - Output directory (default: `./output`)
- `--save-video` - Save video file when extracting transcript
- `--api-key` - Override API key from environment
uv run python douyin-video/scripts/douyin_downloader.py
-l "https://v.douyin.com/xxxxx/"
-a extract
-o ./output
--save-video

**命令行参数:**
- `-l, --link` - 抖音分享链接(必填)
- `-a, --action` - 操作类型:`info`(查看信息)、`download`(下载)或`extract`(提取转写文本)(必填)
- `-o, --output` - 输出目录(默认:`./output`)
- `--save-video` - 提取转写文本时同时保存视频文件
- `--api-key` - 覆盖环境变量中的API密钥

Python Integration

Python集成

Parse Video Info

解析视频信息

python
from douyin_video.parser import DouyinParser
python
from douyin_video.parser import DouyinParser

Initialize parser

初始化解析器

parser = DouyinParser()
parser = DouyinParser()

Parse video information

解析视频信息

share_link = "https://v.douyin.com/xxxxx/" video_info = parser.parse_video_info(share_link)
print(f"Title: {video_info['title']}") print(f"Video ID: {video_info['video_id']}") print(f"Download URL: {video_info['download_url']}")
undefined
share_link = "https://v.douyin.com/xxxxx/" video_info = parser.parse_video_info(share_link)
print(f"标题: {video_info['title']}") print(f"视频ID: {video_info['video_id']}") print(f"下载链接: {video_info['download_url']}")
undefined

Download Video

下载视频

python
from douyin_video.downloader import DouyinDownloader

downloader = DouyinDownloader()
python
from douyin_video.downloader import DouyinDownloader

downloader = DouyinDownloader()

Download watermark-free video

下载无水印视频

video_url = "https://v.douyin.com/xxxxx/" output_path = "./videos" file_path = downloader.download_video(video_url, output_path) print(f"Video saved to: {file_path}")
undefined
video_url = "https://v.douyin.com/xxxxx/" output_path = "./videos" file_path = downloader.download_video(video_url, output_path) print(f"视频已保存至: {file_path}")
undefined

Extract Transcript

提取转写文本

python
from douyin_video.transcriber import VideoTranscriber
import os
python
from douyin_video.transcriber import VideoTranscriber
import os

Initialize with API key

使用API密钥初始化

api_key = os.getenv("API_KEY") transcriber = VideoTranscriber(api_key=api_key)
api_key = os.getenv("API_KEY") transcriber = VideoTranscriber(api_key=api_key)

Extract transcript from video URL

从视频链接提取转写文本

video_url = "https://v.douyin.com/xxxxx/" transcript = transcriber.extract_transcript(video_url)
print(f"Transcript: {transcript['text']}") print(f"Video ID: {transcript['video_id']}") print(f"Title: {transcript['title']}")
video_url = "https://v.douyin.com/xxxxx/" transcript = transcriber.extract_transcript(video_url)
print(f"转写文本: {transcript['text']}") print(f"视频ID: {transcript['video_id']}") print(f"标题: {transcript['title']}")

Save as Markdown

保存为Markdown格式

transcriber.save_markdown( transcript=transcript, output_dir="./output" )
undefined
transcriber.save_markdown( transcript=transcript, output_dir="./output" )
undefined

Handle Large Files

处理大型文件

The library automatically handles large audio files:
python
undefined
该库会自动处理大型音频文件:
python
undefined

Files >1 hour or >50MB are automatically chunked

超过1小时或50MB的文件会自动分块

No special configuration needed

无需特殊配置

transcript = transcriber.extract_transcript(long_video_url)
transcript = transcriber.extract_transcript(long_video_url)

Chunks are processed and merged automatically

分块会被自动处理并合并

undefined
undefined

Configuration

配置说明

API Key Setup

API密钥设置

Get a free API key from SiliconFlow (new users get free credits).
Option 1: Environment Variable
bash
export API_KEY="sk-xxxxxxxxxxxxxxxx"
Option 2: WebUI Browser Storage
  1. Open WebUI
  2. Click "API 未配置" button
  3. Enter and save API key
  4. Key persists in browser localStorage
Option 3: CLI Argument
bash
uv run python douyin-video/scripts/douyin_downloader.py \
  --api-key "sk-xxxxxxxxxxxxxxxx" \
  -l "https://v.douyin.com/xxxxx/" \
  -a extract
SiliconFlow获取免费API密钥(新用户可获得免费额度)。
选项1:环境变量
bash
export API_KEY="sk-xxxxxxxxxxxxxxxx"
选项2:WebUI浏览器存储
  1. 打开WebUI
  2. 点击“API 未配置”按钮
  3. 输入并保存API密钥
  4. 密钥会保存在浏览器localStorage中
选项3:命令行参数
bash
uv run python douyin-video/scripts/douyin_downloader.py \
  --api-key "sk-xxxxxxxxxxxxxxxx" \
  -l "https://v.douyin.com/xxxxx/" \
  -a extract

Output Format

输出格式

Extracted transcripts are saved as Markdown:
markdown
undefined
提取的转写文本会保存为Markdown格式:
markdown
undefined

Video Title

视频标题

属性
视频ID
7600361826030865707
提取时间2026-01-30 14:19:00
下载链接点击下载

属性
视频ID
7600361826030865707
提取时间2026-01-30 14:19:00
下载链接点击下载

文案内容

文案内容

Transcribed text content appears here...
undefined
转写文本内容显示在这里...
undefined

Common Patterns

常见使用场景

Batch Processing Multiple Videos

批量处理多个视频

python
from douyin_video.transcriber import VideoTranscriber
import os

api_key = os.getenv("API_KEY")
transcriber = VideoTranscriber(api_key=api_key)

video_urls = [
    "https://v.douyin.com/xxxxx1/",
    "https://v.douyin.com/xxxxx2/",
    "https://v.douyin.com/xxxxx3/",
]

for url in video_urls:
    try:
        transcript = transcriber.extract_transcript(url)
        transcriber.save_markdown(transcript, "./batch_output")
        print(f"✓ Processed: {transcript['title']}")
    except Exception as e:
        print(f"✗ Failed {url}: {e}")
python
from douyin_video.transcriber import VideoTranscriber
import os

api_key = os.getenv("API_KEY")
transcriber = VideoTranscriber(api_key=api_key)

video_urls = [
    "https://v.douyin.com/xxxxx1/",
    "https://v.douyin.com/xxxxx2/",
    "https://v.douyin.com/xxxxx3/",
]

for url in video_urls:
    try:
        transcript = transcriber.extract_transcript(url)
        transcriber.save_markdown(transcript, "./batch_output")
        print(f"✓ 处理完成: {transcript['title']}")
    except Exception as e:
        print(f"✗ 处理失败 {url}: {e}")

Error Handling

错误处理

python
from douyin_video.parser import DouyinParser
from douyin_video.exceptions import ParseError, DownloadError

parser = DouyinParser()

try:
    video_info = parser.parse_video_info(share_link)
except ParseError as e:
    print(f"Failed to parse video: {e}")
except DownloadError as e:
    print(f"Failed to download: {e}")
except Exception as e:
    print(f"Unexpected error: {e}")
python
from douyin_video.parser import DouyinParser
from douyin_video.exceptions import ParseError, DownloadError

parser = DouyinParser()

try:
    video_info = parser.parse_video_info(share_link)
except ParseError as e:
    print(f"解析视频失败: {e}")
except DownloadError as e:
    print(f"下载失败: {e}")
except Exception as e:
    print(f"意外错误: {e}")

Custom Output Handling

自定义输出处理

python
from douyin_video.transcriber import VideoTranscriber
import json

transcriber = VideoTranscriber(api_key=os.getenv("API_KEY"))
transcript = transcriber.extract_transcript(video_url)
python
from douyin_video.transcriber import VideoTranscriber
import json

transcriber = VideoTranscriber(api_key=os.getenv("API_KEY"))
transcript = transcriber.extract_transcript(video_url)

Save as JSON

保存为JSON格式

with open("transcript.json", "w", encoding="utf-8") as f: json.dump(transcript, f, ensure_ascii=False, indent=2)
with open("transcript.json", "w", encoding="utf-8") as f: json.dump(transcript, f, ensure_ascii=False, indent=2)

Extract specific fields

提取特定字段

video_id = transcript["video_id"] text_content = transcript["text"] download_url = transcript["download_url"]
undefined
video_id = transcript["video_id"] text_content = transcript["text"] download_url = transcript["download_url"]
undefined

Troubleshooting

故障排查

FFmpeg Not Found

FFmpeg未找到

Error:
FileNotFoundError: [Errno 2] No such file or directory: 'ffmpeg'
Solution:
bash
undefined
错误:
FileNotFoundError: [Errno 2] No such file or directory: 'ffmpeg'
解决方案:
bash
undefined

Verify FFmpeg installation

验证FFmpeg是否安装

ffmpeg -version
ffmpeg -version

If not installed, install via package manager

如果未安装,通过包管理器安装

brew install ffmpeg # macOS apt install ffmpeg # Ubuntu
undefined
brew install ffmpeg # macOS apt install ffmpeg # Ubuntu
undefined

API Key Not Working

API密钥无效

Error:
Unauthorized: Invalid API key
Solution:
  1. Verify API key is correct
  2. Check environment variable:
    echo $API_KEY
  3. Ensure API key has sufficient credits at SiliconFlow
错误:
Unauthorized: Invalid API key
解决方案:
  1. 确认API密钥正确
  2. 检查环境变量:
    echo $API_KEY
  3. 确保API密钥在SiliconFlow有足够的额度

Large File Processing Fails

大型文件处理失败

Error:
Request Entity Too Large
or timeout errors
Solution: The library automatically chunks large files, but ensure:
  • FFmpeg is installed and accessible
  • Sufficient disk space for temporary files
  • Stable network connection for multiple API calls
错误:
Request Entity Too Large
或超时错误
解决方案: 该库会自动对大型文件分块,但需确保:
  • FFmpeg已安装且可正常访问
  • 有足够的磁盘空间存放临时文件
  • 网络连接稳定以支持多次API调用

Video Link Not Parsing

视频链接无法解析

Error:
Failed to parse video link
Solution:
  1. Ensure link is a valid Douyin share link (starts with
    https://v.douyin.com/
    )
  2. Try copying the share link again from the Douyin app
  3. Check if video is still available (not deleted)
错误:
Failed to parse video link
解决方案:
  1. 确保链接是有效的抖音分享链接(以
    https://v.douyin.com/
    开头)
  2. 尝试重新从抖音APP复制分享链接
  3. 检查视频是否仍可访问(未被删除)

Permission Denied on Output Directory

输出目录权限不足

Error:
PermissionError: [Errno 13] Permission denied
Solution:
bash
undefined
错误:
PermissionError: [Errno 13] Permission denied
解决方案:
bash
undefined

Ensure output directory exists and is writable

确保输出目录存在且可写

mkdir -p ./output chmod 755 ./output
mkdir -p ./output chmod 755 ./output

Or specify a different output directory

或指定其他输出目录

uv run python douyin-video/scripts/douyin_downloader.py
-l "url" -a extract -o ~/Documents/douyin_output
undefined
uv run python douyin-video/scripts/douyin_downloader.py
-l "url" -a extract -o ~/Documents/douyin_output
undefined

WebUI Not Loading

WebUI无法加载

Error: Browser shows connection refused or 404
Solution:
bash
undefined
错误: 浏览器显示连接拒绝或404
解决方案:
bash
undefined

Ensure server is running

确保服务器正在运行

uv run python web/app.py
uv run python web/app.py

Check if port 8080 is available

检查端口8080是否可用

lsof -i :8080
lsof -i :8080

Use different port if needed

如果需要,使用其他端口

PORT=8081 uv run python web/app.py
undefined
PORT=8081 uv run python web/app.py
undefined

Advanced Usage

进阶用法

Custom Transcription Settings

自定义转写设置

python
from douyin_video.transcriber import VideoTranscriber

transcriber = VideoTranscriber(
    api_key=os.getenv("API_KEY"),
    model="FunAudioLLM/SenseVoiceSmall",  # Default model
    chunk_duration=540  # 9 minutes per chunk (default)
)
python
from douyin_video.transcriber import VideoTranscriber

transcriber = VideoTranscriber(
    api_key=os.getenv("API_KEY"),
    model="FunAudioLLM/SenseVoiceSmall",  # 默认模型
    chunk_duration=540  # 每个分块9分钟(默认值)
)

Programmatic MCP Server

程序化MCP服务器

python
from douyin_video.mcp_server import DouyinMCPServer

server = DouyinMCPServer(api_key=os.getenv("API_KEY"))
await server.run()
python
from douyin_video.mcp_server import DouyinMCPServer

server = DouyinMCPServer(api_key=os.getenv("API_KEY"))
await server.run()

Related Resources

相关资源