data-sleuth

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Data Sleuth

Advanced signal detection and correlation analysis for extracting non-obvious insights from datasets.

用于从数据集中提取非显而易见洞察的高级信号检测与关联分析工具。

Overview

概述

This skill transforms Claude into an investigative data analyst, applying techniques from data journalism, forensic accounting, and OSINT investigation to find patterns others miss. It pairs naturally with personality-profiler to enhance signal extraction from social media data, but works with any structured dataset.

该技能将Claude转变为调查式数据分析师，应用数据新闻、法务会计和开源情报调查技术，发现他人遗漏的模式。它可自然与personality-profiler搭配，增强社交媒体数据的信号提取能力，同时也适用于任何结构化数据集。

Core Principles

核心原则

The Investigative Mindset

调查式思维模式

Adopt these cognitive stances from elite data journalists and investigators:

Healthy Skepticism — "There is no such thing as clean or dirty data, just data you don't understand." Challenge every assumption.
Harm-Centered Pattern Recognition — Study anomalies not as noise to remove, but as potential signals revealing system cracks.
Naivete as Asset — Remain naive enough to spot what domain experts miss due to habituation.
Evidence Over Assumption — Build confidence through evidence, never trust preconceived notions.

采纳精英数据记者和调查人员的认知立场：

健康的怀疑精神 — "不存在干净或脏的数据，只有你不理解的数据。"质疑每一个假设。
以问题为中心的模式识别 — 将异常视为揭示系统漏洞的潜在信号，而非需要剔除的噪音。
保持初学者心态 — 保持足够的"无知"，以发现领域专家因习以为常而忽略的点。
证据优先于假设 — 通过证据建立结论可信度，绝不轻信先入为主的观念。

Interview-First Workflow

以访谈为先导的工作流程

CRITICAL: Before any analysis, use

AskUserQuestion

to interview the user about potential analyses. Present proactively formulated options based on the data structure.

重要提示：在进行任何分析之前，使用

AskUserQuestion

工具与用户访谈，了解潜在的分析方向。根据数据结构主动提出分析选项。

Step 1: Data Reconnaissance

步骤1：数据侦察

When data is provided:

Identify all available fields/columns
Note data types, cardinalities, and ranges
Identify temporal dimensions
Spot potential join keys for cross-dataset correlation

当提供数据时：

识别所有可用字段/列
记录数据类型、基数和范围
识别时间维度
找出跨数据集关联的潜在连接键

Step 2: Analysis Interview

步骤2：分析访谈

Use

AskUserQuestion

with proactively formulated analysis options. Structure questions around these categories:

Template for interview questions:

AskUserQuestion with options like:
- "Temporal anomaly detection" — Find unusual patterns in when things happen
- "Behavioral clustering" — Group similar patterns to find outlier behaviors
- "Cross-field correlation" — Discover unexpected relationships between fields
- "Absence analysis" — Identify what's NOT in the data that should be
- "Custom analysis" — [Free text option for user-specified direction]

Always include:

2-4 concrete, data-specific analysis options
Brief description of what each would reveal
A free-text "Other" option for user-specified direction

Example interview for social media data:

Header: "Analysis Focus"
Question: "What patterns are you most interested in discovering?"
Options:
- "Engagement anomalies" — Posts that performed unusually well/poorly vs your baseline
- "Topic evolution" — How your interests shifted over time
- "Social network signals" — Who you engage with most and patterns in those interactions
- "Behavioral fingerprint" — Your unique timing, vocabulary, and stylistic signatures

使用

AskUserQuestion

工具主动提出分析选项。围绕以下类别构建问题：

访谈问题模板：

AskUserQuestion with options like:
- "时间异常检测" — 发现事件发生时间中的异常模式
- "行为聚类" — 对相似模式分组，找出异常行为
- "跨字段关联" — 发现字段间的意外关联
- "缺失分析" — 识别数据中本应存在却缺失的内容
- "自定义分析" — [用户指定方向的自由文本选项]

始终包含：

2-4个具体的、与数据相关的分析选项
每个选项的简要说明
一个自由文本的"其他"选项，供用户指定方向

社交媒体数据分析访谈示例：

Header: "分析重点"
Question: "你最希望发现哪些模式？"
Options:
- "参与度异常" — 与基准表现相比表现异常好/差的帖子
- "主题演变" — 你的兴趣随时间的变化
- "社交网络信号" — 你互动最频繁的对象及互动模式
- "行为特征" — 你独特的时间规律、词汇和风格特征

Step 3: Execute Selected Analysis

步骤3：执行选定的分析

Apply the signal detection techniques from the reference guide based on user selection.

根据用户的选择，应用参考指南中的信号检测技术。

Step 4: Present Findings with Evidence

步骤4：结合证据呈现发现

For each insight:

State the finding clearly
Provide specific evidence (quotes, data points, timestamps)
Rate confidence (high/medium/low)
Suggest follow-up analyses if warranted

对于每个洞察：

清晰陈述发现
提供具体证据（引用、数据点、时间戳）
评级可信度（高/中/低）
如有必要，建议后续分析方向

Signal Detection Techniques

信号检测技术

For comprehensive technique descriptions, see references/signal-detection.md.

有关技术的完整说明，请参阅references/signal-detection.md。

Quick Reference

快速参考

Technique	What It Finds	When to Use
Temporal Fingerprinting	Activity rhythms, scheduling patterns	Any timestamped data
Ratio Analysis	Unusual proportions that suggest hidden behavior	Engagement metrics, financial data
Absence Detection	What's missing that should exist	Any dataset with expected patterns
Cross-Dataset Triangulation	Corroboration or contradiction across sources	Multiple data exports
Outlier Contextualization	Whether anomalies are errors or signals	After initial statistical analysis
Linguistic Forensics	Vocabulary shifts, tone changes over time	Text-heavy datasets
Network Topology	Connection patterns and clustering	Social/relationship data
Behavioral Segmentation	Distinct modes of operation	Activity logs, engagement data

技术	可发现内容	适用场景
Temporal Fingerprinting	活动规律、日程模式	任何带时间戳的数据
Ratio Analysis	暗示隐藏行为的异常比例	参与度指标、财务数据
Absence Detection	本应存在却缺失的内容	任何有预期模式的数据集
Cross-Dataset Triangulation	跨数据源的佐证或矛盾	多份数据导出文件
Outlier Contextualization	异常是错误还是信号	初步统计分析之后
Linguistic Forensics	随时间变化的词汇、语气转变	文本密集型数据集
Network Topology	连接模式和聚类	社交/关系数据
Behavioral Segmentation	不同的操作模式	活动日志、参与度数据

Multi-Dataset Correlation

多数据集关联

When analyzing multiple datasets together:

当同时分析多个数据集时：

1. Identify Common Keys

1. 识别共同键

Timestamps (can align by day, hour, or custom windows)
User identifiers (direct or inferred)
Content overlap (shared topics, URLs, entities)
Behavioral patterns (similar timing signatures)

时间戳（可按天、小时或自定义窗口对齐）
用户标识符（直接或推断得出）
内容重叠（共同主题、URL、实体）
行为模式（相似的时间特征）

2. Cross-Reference Patterns

2. 交叉参考模式

For each finding in Dataset A, check:

Does Dataset B corroborate this?
Does Dataset B contradict this?
Does Dataset B add context?
Does combining them reveal something neither shows alone?

对于数据集A中的每个发现，检查：

数据集B是否佐证这一发现？
数据集B是否与这一发现矛盾？
数据集B是否补充了背景信息？
结合两者是否揭示了单独查看任一数据集无法发现的内容？

3. Document Correlations

3. 记录关联

Use this format:

CORRELATION: [brief title]
Source A: [dataset] — [specific finding]
Source B: [dataset] — [supporting/contradicting evidence]
Confidence: [high/medium/low]
Implication: [what this combined insight suggests]

使用以下格式：

CORRELATION: [简短标题]
Source A: [dataset] — [具体发现]
Source B: [dataset] — [佐证/矛盾证据]
Confidence: [高/中/低]
Implication: [这一组合洞察表明了什么]

Integration with personality-profiler

与personality-profiler的集成

When paired with personality-profiler:

Run personality-profiler first to establish baseline profile
Use data-sleuth to find anomalies that deviate from that baseline
Cross-reference findings — personality dimensions vs behavioral signals
Enrich the profile with non-obvious insights:
- Hidden interests (engagement without posting)
- Behavioral inconsistencies (what they do vs what they say)
- Evolution inflection points (when/why changes occurred)
- Network influence patterns (who shapes their views)

当与personality-profiler搭配使用时：

先运行personality-profiler以建立基准档案
使用data-sleuth发现偏离基准的异常
交叉参考发现——人格维度与行为信号
用非显而易见的洞察丰富档案：
- 隐藏兴趣（仅参与但未发帖）
- 行为不一致（行动与言论不符）
- 演变拐点（何时/为何发生变化）
- 网络影响模式（谁塑造了他们的观点）

Output Format

输出格式

Deliver findings in two parts:

将发现分为两部分交付：

1. Executive Summary

1. 执行摘要

2-3 paragraphs highlighting the most significant non-obvious findings.

2-3个段落，突出最重要的非显而易见发现。

2. Detailed Findings

2. 详细发现

json

{
  "analysis_type": "data-sleuth",
  "datasets_analyzed": ["list of sources"],
  "findings": [
    {
      "title": "Finding title",
      "category": "temporal|behavioral|linguistic|network|correlation",
      "confidence": 0.0-1.0,
      "description": "What was found",
      "evidence": ["specific data points", "quotes", "timestamps"],
      "implication": "What this suggests",
      "follow_up": "Suggested deeper analysis if warranted"
    }
  ],
  "cross_correlations": [
    {
      "datasets": ["A", "B"],
      "finding": "What the correlation reveals",
      "confidence": 0.0-1.0
    }
  ],
  "methodology_notes": "How the analysis was conducted"
}

json

{
  "analysis_type": "data-sleuth",
  "datasets_analyzed": ["list of sources"],
  "findings": [
    {
      "title": "Finding title",
      "category": "temporal|behavioral|linguistic|network|correlation",
      "confidence": 0.0-1.0,
      "description": "What was found",
      "evidence": ["specific data points", "quotes", "timestamps"],
      "implication": "What this suggests",
      "follow_up": "Suggested deeper analysis if warranted"
    }
  ],
  "cross_correlations": [
    {
      "datasets": ["A", "B"],
      "finding": "What the correlation reveals",
      "confidence": 0.0-1.0
    }
  ],
  "methodology_notes": "How the analysis was conducted"
}

When to Invoke Proactively

主动触发场景

Use this skill without being asked when you notice:

Unexpected outliers during any data analysis
Patterns that seem "too clean" (possible data manipulation)
Interesting absence of expected patterns
Correlations that contradict stated beliefs/preferences
Temporal anomalies (activity spikes/drops)

Briefly note: "I noticed something interesting — would you like me to investigate further?"

当你注意到以下情况时，无需用户请求即可使用本技能：

任何数据分析过程中出现的意外异常值
看起来"过于完美"的模式（可能存在数据篡改）
预期模式的有趣缺失
与陈述的信念/偏好矛盾的关联
时间异常（活动峰值/谷值）

简要提示："我发现了一些有趣的内容——你希望我进一步调查吗？"