together-embeddings

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Together Embeddings & Reranking

Together Embeddings & Reranking 嵌入与重排序

Overview

概述

Use this skill for semantic retrieval components:
  • create embeddings
  • batch embeddings
  • build retrieval or RAG pipelines
  • rerank retrieved candidates
This skill is for retrieval plumbing, not for the final language-model response itself.
本skill适用于以下语义检索组件场景:
  • 生成embeddings
  • 批量生成embeddings
  • 构建检索或RAG管道
  • 对检索到的候选结果重排序
本skill用于检索链路底层能力,而非直接生成大语言模型最终响应。

When This Skill Wins

适用场景

  • Build vector search or semantic similarity features
  • Add embedding generation to a data pipeline
  • Improve retrieval quality with reranking
  • Assemble a retrieval stage before calling a chat model
  • 构建向量搜索或语义相似度功能
  • 在数据管道中加入embedding生成能力
  • 通过重排序提升检索质量
  • 在调用聊天模型前搭建检索阶段

Hand Off To Another Skill

可转接的其他skill

  • Use
    together-chat-completions
    for the final answer-generation step
  • Use
    together-batch-inference
    for very large offline embedding backfills
  • Use
    together-dedicated-endpoints
    when reranking requires a dedicated deployment
  • 最终答案生成步骤使用
    together-chat-completions
  • 超大规模离线embedding回填使用
    together-batch-inference
  • 当重排序需要专用部署时,使用
    together-dedicated-endpoints

Quick Routing

快速上手指引

  • Embeddings API usage
    • Read references/api-reference.md
    • Start with scripts/embed_and_rerank.py or scripts/embed_and_rerank.ts
  • Semantic search (embed, store, query)
    • Start with scripts/semantic_search.py -- includes an in-memory vector store, cosine-similarity retrieval, and optional rerank
  • RAG pipeline composition
    • Start with scripts/rag_pipeline.py
  • Model selection and rerank constraints
    • Read references/models.md
  • Embeddings API 使用
    • 阅读 references/api-reference.md
    • scripts/embed_and_rerank.pyscripts/embed_and_rerank.ts 开始上手
  • 语义搜索(嵌入、存储、查询)
    • scripts/semantic_search.py 开始上手 -- 包含内存向量存储、余弦相似度检索和可选重排序能力
  • RAG管道搭建
    • scripts/rag_pipeline.py 开始上手
  • 模型选择与重排序约束
    • 阅读 references/models.md

Workflow

工作流程

  1. Confirm that the user needs vectors or retrieval, not direct generation.
  2. Choose the embedding model and batch shape.
  3. Generate embeddings for corpus and query paths consistently.
  4. Retrieve candidates. An in-memory cosine-similarity store works for prototyping and small corpora (see
    semantic_search.py
    ). Use a dedicated vector database for production scale.
  5. Rerank only when the extra latency and endpoint requirement are justified. When no dedicated rerank endpoint is available, cosine-similarity ranking is a reasonable fallback.
  1. 确认用户需要向量或检索能力,而非直接生成内容。
  2. 选择embedding模型和批量大小。
  3. 为语料库和查询链路生成一致的embeddings。
  4. 检索候选结果。内存余弦相似度存储适用于原型开发和小型语料库(参见
    semantic_search.py
    )。生产级规模请使用专用向量数据库。
  5. 仅在额外延迟和端点需求合理时使用重排序。若无可用的专用重排序端点,余弦相似度排序是合理的降级方案。

High-Signal Rules

重要注意事项

  • Python scripts require the Together v2 SDK (
    together>=2.0.0
    ). If the user is on an older version, they must upgrade first:
    uv pip install --upgrade "together>=2.0.0"
    .
  • Keep embeddings and reranking conceptually separate; rerank is a second-stage precision step.
  • Reranking in this repo assumes a dedicated endpoint. Do not promise serverless rerank unless the product changes. When no endpoint is available, fall back to cosine-similarity ranking.
  • The embedding model has a 514-token context limit. Chunk longer documents before embedding.
  • The
    rag_pipeline.py
    example demonstrates retrieval plus generation; treat generation as a hand-off to chat completions.
  • Preserve model consistency across indexing and querying.
  • Python脚本依赖Together v2 SDK(
    together>=2.0.0
    )。如果用户使用的是旧版本,必须先升级:
    uv pip install --upgrade "together>=2.0.0"
  • 从概念上区分embeddings和重排序:重排序是第二阶段的精准度优化步骤。
  • 本仓库中的重排序功能默认需要专用端点。除非产品迭代支持,否则不要承诺提供serverless重排序能力。无可用端点时,请降级使用余弦相似度排序。
  • embedding模型的上下文限制为514-token。生成embedding前请对长文档进行分块。
  • rag_pipeline.py
    示例展示了检索加生成的流程:请将生成部分转接给chat completions能力。
  • 索引和查询阶段需保持使用的模型一致。

Resource Map

资源索引

  • API details: references/api-reference.md
  • Model guide: references/models.md
  • Python embeddings example: scripts/embed_and_rerank.py
  • TypeScript embeddings example: scripts/embed_and_rerank.ts
  • Python semantic search: scripts/semantic_search.py
  • Python RAG pipeline: scripts/rag_pipeline.py
  • API详情: references/api-reference.md
  • 模型指南: references/models.md
  • Python embeddings示例: scripts/embed_and_rerank.py
  • TypeScript embeddings示例: scripts/embed_and_rerank.ts
  • Python语义搜索示例: scripts/semantic_search.py
  • Python RAG管道示例: scripts/rag_pipeline.py

Official Docs

官方文档