algo-ecom-search
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseE-Commerce Search Relevance
电商搜索相关性
Overview
概述
E-commerce search is a pipeline: query understanding → retrieval → ranking → presentation. Each stage affects relevance. Optimization requires diagnosing WHICH stage fails, not just tuning one component. Zero-result rate, click-through rate, and add-to-cart rate are key metrics.
电商搜索是一个流程:查询理解 → 检索 → 排序 → 展示。每个阶段都会影响相关性。优化需要诊断哪个阶段出现问题,而非仅调整单一组件。零结果率、点击率(CTR)和加购率是关键指标。
When to Use
使用场景
Trigger conditions:
- Diagnosing why search results don't meet user expectations
- Implementing query processing features (spell check, synonyms, intent detection)
- Reducing zero-result searches and improving conversion
When NOT to use:
- For ranking algorithm design only (use e-commerce ranking skill)
- For text relevance scoring only (use BM25)
触发条件:
- 诊断搜索结果不符合用户预期的原因
- 实现查询处理功能(拼写检查、同义词、意图识别)
- 减少零结果搜索并提升转化率
不适用场景:
- 仅用于排序算法设计(使用电商排序Skill)
- 仅用于文本相关性评分(使用BM25)
Algorithm
算法
IRON LAW: Search Quality Is Determined by the WEAKEST Pipeline Stage
Query understanding, retrieval, ranking, and presentation are sequential.
Perfect ranking cannot fix bad retrieval (missing products). Perfect
retrieval cannot fix bad query understanding (wrong intent). Diagnose
which stage fails FIRST before optimizing.IRON LAW: Search Quality Is Determined by the WEAKEST Pipeline Stage
Query understanding, retrieval, ranking, and presentation are sequential.
Perfect ranking cannot fix bad retrieval (missing products). Perfect
retrieval cannot fix bad query understanding (wrong intent). Diagnose
which stage fails FIRST before optimizing.Phase 1: Input Validation
阶段1:输入验证
Audit current search: sample 100 queries by volume. For each, evaluate: query understanding (correct intent?), retrieval (relevant products in candidate set?), ranking (best products at top?), presentation (useful display?).
Gate: Weakness localized to specific pipeline stage(s).
审核当前搜索情况:按搜索量抽样100条查询。针对每条查询,评估:查询理解(意图是否正确?)、检索(候选集中是否有相关商品?)、排序(最优商品是否排在顶部?)、展示(展示是否有用?)。
关卡: 定位到特定流程阶段的薄弱环节。
Phase 2: Core Algorithm
阶段2:核心算法
Query understanding: 1. Spell correction (edit distance, n-gram). 2. Synonym expansion (earbuds↔earphones). 3. Intent classification (product search vs brand search vs category browse). 4. Query rewriting (attribute extraction: "red shoes size 10" → color:red, category:shoes, size:10).
Retrieval optimization: 1. Multi-field search (title, description, brand, category, SKU). 2. Boosting strategies (title match > description match). 3. Filter vs boost (hard constraints: category, availability vs soft signals: popularity).
Result quality: 1. Zero-result fallback (relax query, suggest alternatives). 2. Faceted navigation (filters by price, brand, rating). 3. Did-you-mean suggestions.
查询理解: 1. 拼写纠错(编辑距离、n-gram)。2. 同义词扩展(earbuds↔earphones)。3. 意图分类(商品搜索 vs 品牌搜索 vs 类目浏览)。4. 查询重写(属性提取:"red shoes size 10" → color:red, category:shoes, size:10)。
检索优化: 1. 多字段搜索(标题、描述、品牌、类目、SKU)。2. 加权策略(标题匹配 > 描述匹配)。3. 过滤 vs 加权(硬约束:类目、库存 vs 软信号:受欢迎程度)。
结果质量: 1. 零结果回退(放宽查询条件、推荐替代选项)。2. 分面导航(按价格、品牌、评分筛选)。3. “您是不是想找”建议。
Phase 3: Verification
阶段3:验证
Measure: zero-result rate (<5% target), CTR on first page (>30% target), NDCG on judged queries.
Gate: Key metrics improve over baseline.
衡量指标:零结果率(目标<5%)、首屏CTR(目标>30%)、人工标注查询的NDCG值。
关卡: 关键指标较基准值有所提升。
Phase 4: Output
阶段4:输出
Return search audit with prioritized improvements.
返回包含优先级改进方案的搜索审核报告。
Output Format
输出格式
json
{
"audit": {"zero_result_rate": 0.08, "avg_ctr": 0.25, "top_failing_queries": ["earbuds wireless", "gift ideas"]},
"recommendations": [{"stage": "query_understanding", "issue": "no_synonym_expansion", "impact": "high", "fix": "Add earbuds↔earphones synonym"}],
"metadata": {"queries_sampled": 100, "period": "2025-Q1"}
}json
{
"audit": {"zero_result_rate": 0.08, "avg_ctr": 0.25, "top_failing_queries": ["earbuds wireless", "gift ideas"]},
"recommendations": [{"stage": "query_understanding", "issue": "no_synonym_expansion", "impact": "high", "fix": "Add earbuds↔earphones synonym"}],
"metadata": {"queries_sampled": 100, "period": "2025-Q1"}
}Examples
示例
Sample I/O
输入输出示例
Input: "wireles earbud" (misspelled) returns 0 results
Expected: Spell correction → "wireless earbuds" → relevant products displayed. Recommendation: implement spell correction.
输入: "wireles earbud"(拼写错误)返回0条结果
预期: 拼写纠错 → "wireless earbuds" → 展示相关商品。建议:实现拼写纠错功能。
Edge Cases
边缘案例
| Input | Expected | Why |
|---|---|---|
| Category-only query ("shoes") | Browse intent, show popular | Not a specific product search |
| Brand misspelling | Fuzzy brand matching | "Nikee" → "Nike" |
| Long-tail query ("blue cotton v-neck t-shirt men XL") | Attribute parsing needed | Multiple structured attributes in free text |
| 输入 | 预期 | 原因 |
|---|---|---|
| 仅类目查询("shoes") | 识别为浏览意图,展示热门商品 | 并非特定商品搜索 |
| 品牌拼写错误 | 模糊品牌匹配 | "Nikee" → "Nike" |
| 长尾查询("blue cotton v-neck t-shirt men XL") | 需要解析属性 | 自由文本中包含多个结构化属性 |
Gotchas
注意事项
- Synonym maintenance: Synonym lists need ongoing curation. "AirPods" is a brand, not a synonym for "earbuds." Wrong synonyms hurt precision.
- Over-recall: Aggressive synonym expansion and fuzzy matching return too many irrelevant results. Balance recall (find everything) with precision (only relevant).
- Language-specific challenges: Chinese search needs word segmentation. "皮鞋" (leather shoes) should not match "拖鞋" (slippers) despite shared "鞋".
- Search analytics are essential: Without tracking query-level CTR, zero-result queries, and conversion rates, you're optimizing blind.
- A/B testing search is hard: Search changes affect all queries. Some improve, some regress. Measure aggregate metrics AND stratify by query type.
- 同义词维护:同义词列表需要持续维护。"AirPods"是品牌,不是"earbuds"的同义词。错误的同义词会损害精准度。
- 过度召回:激进的同义词扩展和模糊匹配会返回过多无关结果。平衡召回率(找到所有相关内容)与精准度(仅返回相关内容)。
- 语言特定挑战:中文搜索需要分词。“皮鞋”不应匹配“拖鞋”,尽管二者共享“鞋”字。
- 搜索分析至关重要:如果不跟踪查询级别的CTR、零结果查询和转化率,优化将盲目进行。
- 搜索A/B测试难度大:搜索变更会影响所有查询。有些查询结果改善,有些则倒退。需衡量整体指标,并按查询类型分层分析。
References
参考资料
- For query understanding pipeline architecture, see
references/query-pipeline.md - For search relevance evaluation methodology, see
references/relevance-evaluation.md
- 查询理解流程架构,请参阅
references/query-pipeline.md - 搜索相关性评估方法,请参阅
references/relevance-evaluation.md