photo-content-recognition-curation-expert

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Photo Content Recognition & Curation Expert

照片内容识别与整理专家

Expert in photo content analysis and intelligent curation. Combines classical computer vision with modern deep learning for comprehensive photo analysis.
专注于照片内容分析与智能整理,结合经典计算机视觉与现代深度学习技术实现全面的照片分析。

When to Use This Skill

何时使用此技能

Use for:
  • Face recognition and clustering (identifying important people)
  • Animal/pet detection and clustering
  • Near-duplicate detection using perceptual hashing (DINOHash, pHash, dHash)
  • Burst photo selection (finding best frame from 10-50 shots)
  • Screenshot vs photo classification
  • Meme/download filtering
  • NSFW content detection
  • Quick indexing for large photo libraries (10K+)
  • Aesthetic quality scoring (NIMA)
NOT for:
  • GPS-based location clustering →
    event-detection-temporal-intelligence-expert
  • Color palette extraction →
    color-theory-palette-harmony-expert
  • Semantic image-text matching →
    clip-aware-embeddings
  • Video analysis or frame extraction
适用场景:
  • 人脸识别与聚类(识别重要人物)
  • 动物/宠物检测与聚类
  • 基于感知哈希的近似重复内容检测(DINOHash、pHash、dHash)
  • 连拍照片筛选(从10-50张照片中选出最佳帧)
  • 截图与照片分类
  • 表情包/下载内容过滤
  • NSFW内容检测
  • 大型照片库快速索引(1万张以上)
  • 美学质量评分(NIMA)
不适用场景:
  • 基于GPS的地点聚类 →
    event-detection-temporal-intelligence-expert
  • 调色板提取 →
    color-theory-palette-harmony-expert
  • 语义图文匹配 →
    clip-aware-embeddings
  • 视频分析或帧提取

Quick Decision Tree

快速决策树

What do you need to recognize/filter?
├─ Duplicate photos? ─────────────────────────────── Perceptual Hashing
│   ├─ Exact duplicates? ──────────────────────────── dHash (fastest)
│   ├─ Brightness/contrast changes? ───────────────── pHash (DCT-based)
│   ├─ Heavy crops/compression? ───────────────────── DINOHash (2025 SOTA)
│   └─ Production system? ─────────────────────────── Hybrid (pHash → DINOHash)
├─ People in photos? ─────────────────────────────── Face Clustering
│   ├─ Known thresholds? ──────────────────────────── Apple-style Agglomerative
│   └─ Unknown data distribution? ─────────────────── HDBSCAN
├─ Pets/Animals? ─────────────────────────────────── Pet Recognition
│   ├─ Detection? ─────────────────────────────────── YOLOv8
│   └─ Individual clustering? ─────────────────────── CLIP + HDBSCAN
├─ Best from burst? ──────────────────────────────── Burst Selection
│   └─ Score: sharpness + face quality + aesthetics
└─ Filter junk? ──────────────────────────────────── Content Detection
    ├─ Screenshots? ───────────────────────────────── Multi-signal classifier
    └─ NSFW? ──────────────────────────────────────── Safety classifier

What do you need to recognize/filter?
├─ Duplicate photos? ─────────────────────────────── Perceptual Hashing
│   ├─ Exact duplicates? ──────────────────────────── dHash (fastest)
│   ├─ Brightness/contrast changes? ───────────────── pHash (DCT-based)
│   ├─ Heavy crops/compression? ───────────────────── DINOHash (2025 SOTA)
│   └─ Production system? ─────────────────────────── Hybrid (pHash → DINOHash)
├─ People in photos? ─────────────────────────────── Face Clustering
│   ├─ Known thresholds? ──────────────────────────── Apple-style Agglomerative
│   └─ Unknown data distribution? ─────────────────── HDBSCAN
├─ Pets/Animals? ─────────────────────────────────── Pet Recognition
│   ├─ Detection? ─────────────────────────────────── YOLOv8
│   └─ Individual clustering? ─────────────────────── CLIP + HDBSCAN
├─ Best from burst? ──────────────────────────────── Burst Selection
│   └─ Score: sharpness + face quality + aesthetics
└─ Filter junk? ──────────────────────────────────── Content Detection
    ├─ Screenshots? ───────────────────────────────── Multi-signal classifier
    └─ NSFW? ──────────────────────────────────────── Safety classifier

Core Concepts

核心概念

1. Perceptual Hashing for Near-Duplicate Detection

1. 基于感知哈希的近似重复内容检测

Problem: Camera bursts, re-saved images, and minor edits create near-duplicates.
Solution: Perceptual hashes generate similar values for visually similar images.
Method Comparison:
MethodSpeedRobustnessBest For
dHashFastestLowExact duplicates
pHashFastMediumBrightness/contrast changes
DINOHashSlowerHighHeavy crops, compression
HybridMediumVery HighProduction systems
Hybrid Pipeline (2025 Best Practice):
  1. Stage 1: Fast pHash filtering (eliminates obvious non-duplicates)
  2. Stage 2: DINOHash refinement (accurate detection)
  3. Stage 3: Optional Siamese ViT verification
Hamming Distance Thresholds:
  • Conservative: ≤5 bits different = duplicates
  • Aggressive: ≤10 bits different = duplicates
Deep dive:
references/perceptual-hashing.md

问题: 相机连拍、重复保存的图片以及轻微编辑会产生近似重复内容。
解决方案: 感知哈希可为视觉相似的图片生成相似的哈希值。
算法对比:
算法速度鲁棒性适用场景
dHash最快完全重复内容
pHash中等亮度/对比度调整后的内容
DINOHash较慢大量裁剪、压缩后的内容
混合算法中等极高生产环境系统
混合流程(2025最佳实践):
  1. 第一阶段: 快速pHash过滤(排除明显非重复内容)
  2. 第二阶段: DINOHash精细化检测(精准识别重复内容)
  3. 第三阶段: 可选Siamese ViT验证
汉明距离阈值:
  • 保守策略:≤5位差异 = 重复内容
  • 激进策略:≤10位差异 = 重复内容
深入学习
references/perceptual-hashing.md

2. Face Recognition & Clustering

2. 人脸识别与聚类

Goal: Group photos by person without user labeling.
Apple Photos Strategy (2021-2025):
  1. Extract face + upper body embeddings (FaceNet, 512-dim)
  2. Two-pass agglomerative clustering
  3. Conservative first pass (threshold=0.4, high precision)
  4. HAC second pass (threshold=0.6, increase recall)
  5. Incremental updates for new photos
HDBSCAN Alternative:
  • No threshold tuning required
  • Robust to noise
  • Better for unknown data distributions
Parameters:
SettingAgglomerativeHDBSCAN
Pass 1 threshold0.4 (cosine)-
Pass 2 threshold0.6 (cosine)-
Min cluster size-3 photos
Metriccosinecosine
Deep dive:
references/face-clustering.md

目标: 无需用户标注即可按人物分组照片。
苹果相册策略(2021-2025):
  1. 提取人脸+上半身特征向量(FaceNet,512维)
  2. 两轮凝聚式聚类
  3. 第一轮保守聚类(阈值=0.4,高精度)
  4. 第二轮HAC聚类(阈值=0.6,提升召回率)
  5. 新增照片的增量更新
HDBSCAN替代方案:
  • 无需调整阈值
  • 抗噪声能力强
  • 更适用于未知数据分布场景
参数设置:
设置项凝聚式聚类HDBSCAN
第一轮阈值0.4(余弦距离)-
第二轮阈值0.6(余弦距离)-
最小聚类规模-3张照片
距离度量余弦余弦
深入学习
references/face-clustering.md

3. Burst Photo Selection

3. 连拍照片筛选

Problem: Burst mode creates 10-50 nearly identical photos.
Multi-Criteria Scoring:
CriterionWeightMeasurement
Sharpness30%Laplacian variance
Face Quality35%Eyes open, smiling, face sharpness
Aesthetics20%NIMA score
Position10%Middle frames bonus
Exposure5%Histogram clipping check
Burst Detection: Photos within 0.5 seconds of each other.
Deep dive:
references/content-detection.md

问题: 连拍模式会生成10-50张几乎相同的照片。
多维度评分体系:
评分维度权重测量方式
清晰度30%拉普拉斯方差
人脸质量35%眼睛睁开、面带微笑、人脸清晰
美学评分20%NIMA分数
帧位置10%中间帧加分
曝光度5%直方图截幅检查
连拍检测: 时间间隔在0.5秒内的照片判定为连拍组。
深入学习
references/content-detection.md

4. Screenshot Detection

4. 截图检测

Multi-Signal Approach:
SignalConfidenceDescription
UI elements0.85Status bars, buttons detected
Perfect rectangles0.75>5 UI buttons (90° angles)
High text0.70>25% text coverage (OCR)
No camera EXIF0.60Missing Make/Model/Lens
Device aspect0.60Exact phone screen ratio
Perfect sharpness0.50>2000 Laplacian variance
Decision: Confidence >0.6 = screenshot
Deep dive:
references/content-detection.md

多信号识别方案:
信号类型置信度描述
UI元素0.85检测到状态栏、按钮
完美矩形0.75超过5个UI按钮(90°角度)
高文本占比0.70文本覆盖率超过25%(OCR识别)
无相机EXIF0.60缺少品牌/型号/镜头信息
设备比例0.60完全匹配手机屏幕比例
极致清晰度0.50拉普拉斯方差超过2000
判定规则: 置信度>0.6则判定为截图
深入学习
references/content-detection.md

5. Quick Indexing Pipeline

5. 快速索引流程

Goal: Index 10K+ photos efficiently with caching.
Features Extracted:
  • Perceptual hashes (de-duplication)
  • Face embeddings (people clustering)
  • CLIP embeddings (semantic search)
  • Color palettes
  • Aesthetic scores
Performance (10K photos, M1 MacBook Pro):
OperationTime
Perceptual hashing2 min
CLIP embeddings3 min (GPU)
Face detection4 min
Color palettes1 min
Aesthetic scoring2 min (GPU)
Clustering + dedup1 min
Total (first run)~13 min
Incremental<1 min
Deep dive:
references/photo-indexing.md

目标: 高效索引1万张以上照片并支持缓存。
提取的特征信息:
  • 感知哈希值(去重)
  • 人脸特征向量(人物聚类)
  • CLIP特征向量(语义搜索)
  • 调色板
  • 美学评分
性能表现(1万张照片,M1 MacBook Pro):
操作耗时
感知哈希计算2分钟
CLIP特征向量提取3分钟(GPU加速)
人脸检测4分钟
调色板提取1分钟
美学评分计算2分钟(GPU加速)
聚类与去重1分钟
首次运行总耗时约13分钟
增量更新耗时<1分钟
深入学习
references/photo-indexing.md

Common Anti-Patterns

常见反模式

Anti-Pattern: Euclidean Distance for Face Embeddings

反模式:使用欧氏距离计算人脸特征向量

What it looks like:
python
distance = np.linalg.norm(embedding1 - embedding2)  # WRONG
Why it's wrong: Face embeddings are normalized; cosine similarity is the correct metric.
What to do instead:
python
from scipy.spatial.distance import cosine
distance = cosine(embedding1, embedding2)  # Correct
错误示例:
python
distance = np.linalg.norm(embedding1 - embedding2)  # WRONG
错误原因: 人脸特征向量已归一化,余弦相似度才是正确的度量方式。
正确做法:
python
from scipy.spatial.distance import cosine
distance = cosine(embedding1, embedding2)  # Correct

Anti-Pattern: Fixed Clustering Thresholds

反模式:固定聚类阈值

What it looks like: Using same distance threshold for all face clusters.
Why it's wrong: Different people have varying intra-class variance (twins vs. diverse ages).
What to do instead: Use HDBSCAN for automatic threshold discovery, or two-pass clustering with conservative + relaxed passes.
错误表现: 对所有人脸聚类使用相同的距离阈值。
错误原因: 不同人群的类内方差不同(比如双胞胎与不同年龄群体)。
正确做法: 使用HDBSCAN自动发现阈值,或采用“保守+宽松”的两轮聚类策略。

Anti-Pattern: Raw Pixel Comparison for Duplicates

反模式:使用原始像素对比去重

What it looks like:
python
is_duplicate = np.allclose(img1, img2)  # WRONG
Why it's wrong: Re-saved JPEGs, crops, brightness changes create pixel differences.
What to do instead: Perceptual hashing (pHash or DINOHash) with Hamming distance.
错误示例:
python
is_duplicate = np.allclose(img1, img2)  # WRONG
错误原因: 重新保存的JPEG、裁剪、亮度调整都会导致像素差异。
正确做法: 使用感知哈希(pHash或DINOHash)结合汉明距离进行去重。

Anti-Pattern: Sequential Face Detection

反模式:串行人脸检测

What it looks like: Processing faces one photo at a time without batching.
Why it's wrong: GPU underutilization, 10x slower than batched.
What to do instead: Batch process images (batch_size=32) with GPU acceleration.
错误表现: 逐张处理照片中的人脸,不使用批量处理。
错误原因: GPU利用率不足,速度比批量处理慢10倍。
正确做法: 使用GPU加速批量处理图片(batch_size=32)。

Anti-Pattern: No Confidence Filtering

反模式:无置信度过滤

What it looks like:
python
for face in all_detected_faces:
    cluster(face)  # No filtering
Why it's wrong: Low-confidence detections create noise clusters (hands, objects).
What to do instead: Filter by confidence (threshold 0.9 for faces).
错误示例:
python
for face in all_detected_faces:
    cluster(face)  # No filtering
错误原因: 低置信度检测结果会产生噪声聚类(比如手、物体被误判为人脸)。
正确做法: 按置信度过滤(人脸检测置信度阈值设为0.9)。

Anti-Pattern: Forcing Every Photo into Clusters

反模式:强制所有照片归入聚类

What it looks like: Assigning noise points to nearest cluster.
Why it's wrong: Solo appearances shouldn't pollute person clusters.
What to do instead: HDBSCAN/DBSCAN naturally identifies noise (label=-1). Keep noise separate.

错误表现: 将噪声点分配给最近的聚类。
错误原因: 单独出现的照片不应污染已有人物聚类。
正确做法: HDBSCAN/DBSCAN可自然识别噪声(标签=-1),将噪声单独存放。

Quick Start

快速开始

python
from photo_curation import PhotoCurationPipeline

pipeline = PhotoCurationPipeline()
python
from photo_curation import PhotoCurationPipeline

pipeline = PhotoCurationPipeline()

Index photo library

索引照片库

index = pipeline.index_library('/path/to/photos')
index = pipeline.index_library('/path/to/photos')

De-duplicate

去重

duplicates = index.find_duplicates() print(f"Found {len(duplicates)} duplicate groups")
duplicates = index.find_duplicates() print(f"Found {len(duplicates)} duplicate groups")

Cluster faces

人脸聚类

face_clusters = index.cluster_faces() print(f"Found {len(face_clusters)} people")
face_clusters = index.cluster_faces() print(f"Found {len(face_clusters)} people")

Select best from bursts

筛选连拍最佳照片

best_photos = pipeline.select_best_from_bursts(index)
best_photos = pipeline.select_best_from_bursts(index)

Filter screenshots

过滤截图

real_photos = pipeline.filter_screenshots(index)
real_photos = pipeline.filter_screenshots(index)

Curate for collage

为拼贴画整理照片

collage_photos = pipeline.curate_for_collage(index, target_count=100)

---
collage_photos = pipeline.curate_for_collage(index, target_count=100)

---

Python Dependencies

Python依赖

torch transformers facenet-pytorch ultralytics hdbscan opencv-python scipy numpy scikit-learn pillow pytesseract

torch transformers facenet-pytorch ultralytics hdbscan opencv-python scipy numpy scikit-learn pillow pytesseract

Integration Points

集成对接点

  • event-detection-temporal-intelligence-expert: Provides temporal event clustering for event-aware curation
  • color-theory-palette-harmony-expert: Extracts color palettes for visual diversity
  • collage-layout-expert: Receives curated photos for assembly
  • clip-aware-embeddings: Provides CLIP embeddings for semantic search and DeepDBSCAN

  • event-detection-temporal-intelligence-expert:提供时间事件聚类,支持事件感知型整理
  • color-theory-palette-harmony-expert:提取调色板,提升视觉多样性
  • collage-layout-expert:接收整理后的照片进行拼贴组装
  • clip-aware-embeddings:提供CLIP特征向量,支持语义搜索与DeepDBSCAN

References

参考资料

  1. DINOHash (2025): "Adversarially Fine-Tuned DINOv2 Features for Perceptual Hashing"
  2. Apple Photos (2021): "Recognizing People in Photos Through Private On-Device ML"
  3. HDBSCAN: "Hierarchical Density-Based Spatial Clustering" (2013-2025)
  4. Perceptual Hashing: dHash (Neal Krawetz), DCT-based pHash

Version: 2.0.0 Last Updated: November 2025
  1. DINOHash (2025):"Adversarially Fine-Tuned DINOv2 Features for Perceptual Hashing"
  2. Apple Photos (2021):"Recognizing People in Photos Through Private On-Device ML"
  3. HDBSCAN:"Hierarchical Density-Based Spatial Clustering" (2013-2025)
  4. Perceptual Hashing:dHash (Neal Krawetz), DCT-based pHash

版本:2.0.0 最后更新:2025年11月