ai-writing-detection
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAI Writing Detection Reference
AI生成文本检测参考手册
Expert-level knowledge base for detecting AI-generated text, compiled from academic research, commercial detection tools, and empirical analysis.
本手册是检测AI生成文本的专业级知识库,整合了学术研究、商用检测工具及实证分析的成果。
Quick Reference: High-Confidence Signals
快速参考:高置信度检测信号
These indicators strongly suggest AI authorship when found together:
当以下指标同时出现时,强烈暗示文本由AI生成:
Vocabulary Red Flags
词汇警示信号
High-signal words (50-700x more common in AI text):
- "delve", "tapestry", "nuanced", "multifaceted", "underscore"
- "intricate interplay", "played a crucial role", "complex and multifaceted"
- "paramount", "pivotal", "meticulous", "holistic", "robust"
- "stands/serves as", "marking a pivotal moment", "underscores its importance"
Overused phrases:
- "It's important to note that..."
- "In today's fast-paced world..."
- "At its core..."
- "Without further ado..."
- "Let me explain..."
See reference/vocabulary-patterns.md for complete lists.
高信号词汇(在AI文本中的出现频率是人类文本的50-700倍):
- "delve", "tapestry", "nuanced", "multifaceted", "underscore"
- "intricate interplay", "played a crucial role", "complex and multifaceted"
- "paramount", "pivotal", "meticulous", "holistic", "robust"
- "stands/serves as", "marking a pivotal moment", "underscores its importance"
过度使用的短语:
- "It's important to note that..."
- "In today's fast-paced world..."
- "At its core..."
- "Without further ado..."
- "Let me explain..."
完整列表请查看 reference/vocabulary-patterns.md。
Structural Red Flags
结构警示信号
- Uniform sentence lengths: 12-18 words consistently (low burstiness)
- Tricolon structures: "research, collaboration, and problem-solving"
- Em dash overuse: AI uses em dashes in a formulaic way to mimic "punched up" sales writing, especially in parallelisms ("it's not X — it's Y"); swapping punctuation doesn't fix the underlying emphasis pattern
- Perfect paragraph uniformity: All paragraphs same approximate length
- Template conclusions: "In summary...", "In conclusion..."
- Negative parallelisms: "It's not about X; it's about Y"
- Elegant variation: Cycling through synonyms to avoid repetition
- False ranges: "From X to Y" with incoherent endpoints
See reference/structural-patterns.md for details.
- 统一的句子长度:持续保持12-18词(低突发性)
- 三并列结构:如 "research, collaboration, and problem-solving"
- 破折号滥用:AI会公式化地使用破折号模仿“强调式”营销写作,尤其在平行结构中(如"it's not X — it's Y");替换标点无法改变其底层的强调模式
- 完美的段落一致性:所有段落长度大致相同
- 模板化结尾:如 "In summary...", "In conclusion..."
- 否定式平行结构:如 "It's not about X; it's about Y"
- 刻意的词汇替换:循环使用同义词避免重复
- 虚假范围表述:"From X to Y" 搭配逻辑不通的端点
详情请查看 reference/structural-patterns.md。
Content Red Flags
内容警示信号
- Importance puffery: "marking a pivotal moment in history"
- Ecosystem/conservation claims without citations
- "Challenges and Future" sections following rigid formula
- Promotional language: "nestled in", "stunning natural beauty", "boasts"
- Superficial analyses: "-ing" phrases attributing significance to facts
See reference/content-patterns.md for details.
- 重要性夸大:如 "marking a pivotal moment in history"
- 无引用的生态/环保声明
- 遵循固定模板的“挑战与未来”章节
- 营销式语言:如 "nestled in", "stunning natural beauty", "boasts"
- 表层分析:使用“-ing”短语为事实赋予不必要的重要性
详情请查看 reference/content-patterns.md。
Formatting Red Flags
格式警示信号
- Title Case in all section headings
- Excessive boldface (every key term bolded)
- Inline-header lists: pattern
**Bold Header**: description - Emojis in formal content or headings
- Subject lines in non-email contexts
See reference/formatting-patterns.md for details.
- 所有章节标题采用标题大小写(Title Case)
- 过度加粗:每个关键术语都被加粗
- 内嵌标题列表:如 格式
**Bold Header**: description - 正式内容或标题中使用表情符号
- 非邮件场景中使用主题行格式
详情请查看 reference/formatting-patterns.md。
Markup Red Flags (Definitive)
标记语言警示信号(确定性指标)
- turn0search0, turn0image0: ChatGPT reference markers
- contentReference[oaicite:]: ChatGPT reference bugs
- utm_source=chatgpt.com: URL tracking (definitive)
- Markdown in wikitext: ## headers, bold, text
- grok_card XML tags: Grok/X specific
See reference/markup-artifacts.md for details.
- turn0search0, turn0image0:ChatGPT的参考标记
- contentReference[oaicite:]:ChatGPT的引用漏洞标记
- utm_source=chatgpt.com:URL追踪参数(确定性指标)
- 在维基文本中使用Markdown:如 ## 标题、加粗、text 格式
- grok_card XML标签:Grok/X专属标记
详情请查看 reference/markup-artifacts.md。
Citation Red Flags
引用警示信号
- Broken external links that never existed (no archive)
- Invalid DOIs/ISBNs: Checksum failures
- Declared but unused references: Cite errors
- Placeholder values: ,
url=URLdate=2025-XX-XX
See reference/citation-patterns.md for details.
- 不存在的失效外部链接(无存档记录)
- 无效的DOI/ISBN:校验和验证失败
- 声明但未使用的参考文献:引用错误
- 占位符内容:如 ,
url=URLdate=2025-XX-XX
详情请查看 reference/citation-patterns.md。
Tone Red Flags
语气警示信号
- Passive and detached voice throughout
- Absence of first-person pronouns where expected
- Consistent formality with no stylistic variation
- Over-politeness and excessive hedging
- 全程使用被动、疏离的语气
- 在预期使用第一人称的场景中未出现第一人称代词
- 语气始终正式,无风格变化
- 过度礼貌且频繁使用模糊性表述
Detection Methodology
检测方法论
Multi-Layer Analysis Approach
多层分析方法
Layer 1: Technical Artifact Scan (Definitive)
- Check for turn0search/oaicite markers (ChatGPT)
- Check for utm_source=chatgpt.com in URLs
- Check for grok_card tags (Grok)
- Check for Markdown in non-Markdown contexts
- If found: Definitive AI involvement
Layer 2: Vocabulary Pattern Matching
- Scan for overused AI words/phrases
- Count frequency of flagged terms
- Look for clusters of high-signal vocabulary
- Check for importance/symbolism phrases
Layer 3: Structural Analysis
- Observe sentence length variation (uniform = AI signal)
- Check paragraph uniformity
- Identify repetitive syntactic templates (tricolons, negative parallelisms)
- Look for elegant variation (synonym cycling)
- Check for false ranges
Layer 4: Content Pattern Analysis
- Check for importance puffery and promotional language
- Look for "Challenges and Future" formula
- Check for ecosystem/conservation claims without citations
- Identify superficial analyses with "-ing" attributions
Layer 5: Citation Verification
- Test external links - do they exist?
- Verify DOI/ISBN checksums
- Check for declared but unused references
- Look for placeholder values
Layer 6: Formatting Analysis
- Check heading capitalization (Title Case = signal)
- Count bold phrases per paragraph
- Look for inline-header list patterns
- Check for emojis in formal content
Layer 7: Stylometric Observation
- Pronoun usage patterns (missing first-person?)
- Tone consistency (too uniform = AI signal)
- Punctuation patterns (em dash overuse? curly quotes?)
Layer 8: Coherence Check
- Do paragraphs build a coherent argument?
- Are concepts repeated with different words?
- Do transitions actually connect ideas?
Layer 9: Confidence Scoring
- Weight multiple signals together
- Require corroborating evidence (3+ signals minimum)
- Apply context-specific adjustments
- Check for mitigating factors (human signals)
- Consider ineffective indicators (don't use them)
第一层:技术痕迹扫描(确定性)
- 检查是否存在turn0search/oaicite标记(ChatGPT专属)
- 检查URL中是否包含utm_source=chatgpt.com
- 检查是否存在grok_card标签(Grok专属)
- 检查非Markdown场景中是否使用Markdown格式
- 若检测到以上任意项:可确定AI参与生成
第二层:词汇模式匹配
- 扫描过度使用的AI词汇/短语
- 统计警示术语的出现频率
- 寻找高信号词汇的聚集现象
- 检查是否存在夸大重要性/象征意义的短语
第三层:结构分析
- 观察句子长度的变化(统一长度=AI信号)
- 检查段落一致性
- 识别重复的句法模板(三并列结构、否定式平行结构)
- 寻找刻意的词汇替换(同义词循环)
- 检查是否存在虚假范围表述
第四层:内容模式分析
- 检查是否存在重要性夸大及营销式语言
- 寻找遵循固定模板的“挑战与未来”章节
- 检查是否存在无引用的生态/环保声明
- 识别使用“-ing”短语的表层分析
第五层:引用验证
- 测试外部链接是否真实存在
- 验证DOI/ISBN的校验和
- 检查是否存在声明但未使用的参考文献
- 寻找占位符内容
第六层:格式分析
- 检查标题大小写格式(标题大小写=信号)
- 统计每段中的加粗短语数量
- 寻找内嵌标题列表模式
- 检查正式内容中是否使用表情符号
第七层:文体特征观察
- 代词使用模式(是否缺失第一人称?)
- 语气一致性(过于统一=AI信号)
- 标点使用模式(是否滥用破折号?是否使用弯引号?)
第八层:连贯性检查
- 段落是否构建了连贯的论证?
- 是否用不同词汇重复同一概念?
- 过渡句是否真正衔接了观点?
第九层:置信度评分
- 综合多个信号的权重
- 要求至少3个相互佐证的信号
- 结合具体场景进行调整
- 检查缓解因素(人类写作信号)
- 排除无效指标(不依赖此类信号)
Model-Specific Patterns
模型专属特征
Different AI models have distinct "fingerprints":
| Model | Key Tells | Technical Artifacts |
|---|---|---|
| ChatGPT/GPT-4 | "delve" (pre-2025), "tapestry", tricolons, em dashes, curly quotes | turn0search, oaicite, utm_source=chatgpt.com |
| Claude | Analytical structure, extended analogies, cautious qualifications | None (uses straight quotes, no tracking) |
| Gemini | Conversational synthesis, fact-dense paragraphs | None (uses straight quotes, no tracking) |
| DeepSeek | Similar to ChatGPT, curly quotes | Curly quotation marks |
| Grok | X/Twitter integration | |
| Perplexity | Source-focused output | |
Important dates:
- ChatGPT launched: November 30, 2022 (text before this is almost certainly human)
- "delve" usage dropped: 2025 (still signals pre-2025 ChatGPT)
See reference/model-fingerprints.md for detailed model patterns.
不同AI模型具有独特的“指纹”:
| 模型 | 核心识别点 | 技术痕迹 |
|---|---|---|
| ChatGPT/GPT-4 | "delve"(2025年前版本)、"tapestry"、三并列结构、破折号、弯引号 | turn0search, oaicite, utm_source=chatgpt.com |
| Claude | 分析性结构、延伸类比、谨慎的限定表述 | 无(使用直引号,无追踪标记) |
| Gemini | 对话式整合、事实密集型段落 | 无(使用直引号,无追踪标记) |
| DeepSeek | 与ChatGPT类似、弯引号 | 弯引号 |
| Grok | X/Twitter集成 | |
| Perplexity | 以来源为核心的输出 | |
重要时间节点:
- ChatGPT发布时间:2022年11月30日(此时间之前的文本几乎可以确定为人类写作)
- "delve"使用量下降:2025年(仍可作为2025年前ChatGPT生成文本的信号)
模型专属特征详情请查看 reference/model-fingerprints.md。
False Positive Prevention
误判预防
Critical requirements:
- Minimum 200 words for reliable analysis
- Never flag on single indicators alone
- Use ensemble scoring (multiple signals required)
High false-positive risk groups:
- Non-native English speakers (61% false positive rate in research)
- Technical/formal writing
- Neurodivergent writers
- Content using grammar correction tools
Ineffective indicators (do NOT rely on these):
- Perfect grammar alone
- "Bland" or "robotic" prose
- "Fancy" or unusual vocabulary
- Letter-like formatting alone
- Conjunctions starting sentences
Signs of human writing:
- Text from before November 30, 2022
- Ability to explain editorial choices
- Personal anecdotes with verifiable details
- Minor errors and natural quirks
See reference/false-positive-prevention.md for detailed guidance.
核心要求:
- 文本长度至少200字才可进行可靠分析
- 绝不仅凭单一指标判定
- 使用综合评分机制(需多个信号佐证)
高误判风险群体:
- 非英语母语者(研究显示误判率达61%)
- 技术/正式写作创作者
- 神经多样性写作者
- 使用语法修正工具的创作者
无效检测指标(请勿依赖):
- 仅凭借完美语法
- “平淡”或“机械”的文风
- “华丽”或不常见的词汇
- 仅凭借类信件格式
- 以连词开头的句子
人类写作的特征:
- 2022年11月30日之前的文本
- 能够解释编辑选择的依据
- 包含可验证细节的个人轶事
- 存在微小错误和自然的写作习惯
误判预防详情请查看 reference/false-positive-prevention.md。
Analysis Output Format
分析输出格式
Structure findings as:
**Overall Assessment**: [Likely AI / Possibly AI / Likely Human / Inconclusive]
**Confidence**: [Low / Medium / High]
**Summary**: 2-3 sentence overview
**Evidence Found**:
- [Category]: [Specific indicator] - "[Quote from text]"
- [Category]: [Specific indicator] - "[Quote from text]"
**Mitigating Factors**: [Elements suggesting human authorship]
**Caveats**: [Limitations, alternative explanations]请按照以下结构整理分析结果:
**整体评估**:[极可能AI生成 / 可能AI生成 / 极可能人类生成 / 无法确定]
**置信度**:[低 / 中 / 高]
**总结**:2-3句话的概述
**检测到的证据**:
- [类别]:[具体指标] - "[文本引用]"
- [类别]:[具体指标] - "[文本引用]"
**缓解因素**:[暗示人类写作的元素]
**注意事项**:[局限性、其他可能的解释]Key Principles
核心原则
- No certainty claims - AI detection is probabilistic
- Multiple signals required - Single indicators prove nothing
- Context matters - Academic writing differs from blogs
- Stakes awareness - False accusations cause real harm
- Evolving field - Detection methods require constant updates
- 不做确定性断言 - AI检测是概率性的
- 需多个信号佐证 - 单一指标无法证明任何结论
- 场景至关重要 - 学术写作与博客写作的标准不同
- 重视判定影响 - 误判会造成实际伤害
- 领域持续演进 - 检测方法需不断更新
Reference Files
参考文件
- vocabulary-patterns.md - Complete word/phrase lists with frequencies
- structural-patterns.md - Sentence, paragraph, and discourse patterns
- content-patterns.md - Importance puffery, promotional language, content tells
- formatting-patterns.md - Title case, boldface, emojis, visual patterns
- markup-artifacts.md - Technical artifacts: turn0search, oaicite, Markdown, tracking
- citation-patterns.md - Broken links, invalid identifiers, hallucinated references
- model-fingerprints.md - GPT, Claude, Gemini, Grok, Perplexity specific tells
- false-positive-prevention.md - Avoiding false accusations, ineffective indicators
- vocabulary-patterns.md - 完整的词汇/短语列表及出现频率
- structural-patterns.md - 句子、段落及语篇模式
- content-patterns.md - 重要性夸大、营销式语言及内容识别点
- formatting-patterns.md - 标题大小写、加粗、表情符号及视觉模式
- markup-artifacts.md - 技术痕迹:turn0search、oaicite、Markdown、追踪参数
- citation-patterns.md - 失效链接、无效标识符、虚构参考文献
- model-fingerprints.md - GPT、Claude、Gemini、Grok、Perplexity的专属识别点
- false-positive-prevention.md - 避免误判、无效指标相关指南
Sources
资料来源
This knowledge base synthesizes research from:
- Stanford HAI (DetectGPT, bias studies)
- GPTZero, Originality.ai, Turnitin, Pangram methodologies
- Academic papers on stylometry and discourse analysis
- Empirical studies on detection accuracy and limitations
- Wikipedia:WikiProject AI Cleanup field guide (2025)
- Community-documented patterns from Wikipedia editing
本知识库整合了以下机构的研究成果:
- 斯坦福HAI(DetectGPT、偏见研究)
- GPTZero、Originality.ai、Turnitin、Pangram的检测方法
- 关于文体学与语篇分析的学术论文
- 检测准确性与局限性的实证研究
- 维基百科:WikiProject AI Cleanup 领域指南(2025)
- 维基百科编辑社区记录的检测模式