ai-sorting

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Build an AI Content Sorter

构建AI内容分类器

Guide the user through building an AI that sorts, tags, or categorizes content. Powered by DSPy classification — works with any label set.
引导用户构建一个可对内容进行排序、打标或分类的AI。基于DSPy分类功能构建——支持任意标签集。

Step 1: Define the sorting task

步骤1:定义分类任务

Ask the user:
  1. What are you sorting? (tickets, emails, reviews, messages, etc.)
  2. What are the categories? (list all labels/buckets)
  3. One category per item, or multiple? (e.g., "priority" vs "all applicable tags")
询问用户:
  1. 你要分类的内容是什么?(工单、邮件、评论、消息等)
  2. 分类类别有哪些?(列出所有标签/分类项)
  3. 每个内容项对应单个类别还是多个?(例如“优先级分类” vs “所有适用标签”)

Step 2: Build the sorter

步骤2:构建分类器

Single category (most common)

单类别分类(最常见)

python
import dspy
from typing import Literal
python
import dspy
from typing import Literal

Your categories

Your categories

CATEGORIES = ["billing", "technical", "account", "feature_request", "general"]
class SortContent(dspy.Signature): """Sort the message into the correct category.""" message: str = dspy.InputField(desc="The content to sort") category: Literal[tuple(CATEGORIES)] = dspy.OutputField(desc="The assigned category")
sorter = dspy.ChainOfThought(SortContent)

Using `Literal` locks the output to valid categories only — the AI can't hallucinate labels. `ChainOfThought` adds reasoning which improves accuracy over bare `Predict`.
CATEGORIES = ["billing", "technical", "account", "feature_request", "general"]
class SortContent(dspy.Signature): """Sort the message into the correct category.""" message: str = dspy.InputField(desc="The content to sort") category: Literal[tuple(CATEGORIES)] = dspy.OutputField(desc="The assigned category")
sorter = dspy.ChainOfThought(SortContent)

使用`Literal`可将输出限定为有效类别——AI不会生成不存在的标签。`ChainOfThought`添加推理逻辑,相比基础的`Predict`能提升准确率。

Multiple tags

多标签分类

python
class TagContent(dspy.Signature):
    """Assign all applicable tags to the content."""
    message: str = dspy.InputField(desc="The content to tag")
    tags: list[Literal[tuple(CATEGORIES)]] = dspy.OutputField(desc="All applicable tags")

tagger = dspy.ChainOfThought(TagContent)
python
class TagContent(dspy.Signature):
    """Assign all applicable tags to the content."""
    message: str = dspy.InputField(desc="The content to tag")
    tags: list[Literal[tuple(CATEGORIES)]] = dspy.OutputField(desc="All applicable tags")

tagger = dspy.ChainOfThought(TagContent)

Step 3: Test the quality

步骤3:测试分类质量

python
from dspy.evaluate import Evaluate

def sorting_metric(example, prediction, trace=None):
    return prediction.category == example.category

evaluator = Evaluate(
    devset=devset,
    metric=sorting_metric,
    num_threads=4,
    display_progress=True,
    display_table=5,
)
score = evaluator(sorter)
For multi-tag, use F1 or Jaccard similarity instead of exact match.
python
from dspy.evaluate import Evaluate

def sorting_metric(example, prediction, trace=None):
    return prediction.category == example.category

evaluator = Evaluate(
    devset=devset,
    metric=sorting_metric,
    num_threads=4,
    display_progress=True,
    display_table=5,
)
score = evaluator(sorter)
对于多标签分类,使用F1或Jaccard相似度而非精确匹配。

Step 4: Improve accuracy

步骤4:提升准确率

Start with
BootstrapFewShot
— fast and usually gives a solid boost:
python
optimizer = dspy.BootstrapFewShot(
    metric=sorting_metric,
    max_bootstrapped_demos=4,
)
optimized_sorter = optimizer.compile(sorter, trainset=trainset)
If accuracy still isn't good enough, upgrade to
MIPROv2
:
python
optimizer = dspy.MIPROv2(
    metric=sorting_metric,
    auto="medium",
)
optimized_sorter = optimizer.compile(sorter, trainset=trainset)
首先使用
BootstrapFewShot
——速度快,通常能显著提升效果:
python
optimizer = dspy.BootstrapFewShot(
    metric=sorting_metric,
    max_bootstrapped_demos=4,
)
optimized_sorter = optimizer.compile(sorter, trainset=trainset)
如果准确率仍未达标,可升级为
MIPROv2
python
optimizer = dspy.MIPROv2(
    metric=sorting_metric,
    auto="medium",
)
optimized_sorter = optimizer.compile(sorter, trainset=trainset)

Step 5: Use it

步骤5:使用分类器

python
result = optimized_sorter(message="I was charged twice on my credit card last month")
print(f"Category: {result.category}")
print(f"Reasoning: {result.reasoning}")
python
result = optimized_sorter(message="I was charged twice on my credit card last month")
print(f"Category: {result.category}")
print(f"Reasoning: {result.reasoning}")

Key patterns

关键模式

  • Use
    Literal
    types
    to lock outputs to valid categories
  • Use
    ChainOfThought
    over
    Predict
    — reasoning improves sorting accuracy
  • Include a
    hint
    field
    during training for tricky examples:
    python
    class SortWithHint(dspy.Signature):
        message: str = dspy.InputField()
        hint: str = dspy.InputField(desc="Optional hint for ambiguous cases")
        category: Literal[tuple(CATEGORIES)] = dspy.OutputField()
    Set
    hint
    in training data, leave empty at inference time.
  • Confidence scores: Add a float output field if you need confidence
  • 使用
    Literal
    类型
    :将输出限定为有效类别
  • 优先使用
    ChainOfThought
    而非
    Predict
    ——推理逻辑能提升分类准确率
  • 训练时为复杂案例添加
    hint
    字段
    python
    class SortWithHint(dspy.Signature):
        message: str = dspy.InputField()
        hint: str = dspy.InputField(desc="Optional hint for ambiguous cases")
        category: Literal[tuple(CATEGORIES)] = dspy.OutputField()
    在训练数据中设置
    hint
    ,推理时留空即可。
  • 置信度分数:如果需要置信度,可添加一个浮点型输出字段

Additional resources

额外资源

  • For worked examples (sentiment, intent, topics), see examples.md
  • Need scores instead of categories? Use
    /ai-scoring
  • Next:
    /ai-improving-accuracy
    to measure and improve your AI
  • 如需完整示例(情感识别、意图识别、主题分类),请查看examples.md
  • 需要分数而非分类结果?请使用
    /ai-scoring
  • 下一步:使用
    /ai-improving-accuracy
    来衡量并优化你的AI