senior-prompt-engineer

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Senior Prompt Engineer

资深提示词工程师

Prompt engineering patterns, LLM evaluation frameworks, and agentic system design.
提示词工程模式、LLM评估框架与Agent系统设计。

Table of Contents

目录

Quick Start

快速开始

bash
undefined
bash
undefined

Analyze and optimize a prompt file

分析并优化提示词文件

python scripts/prompt_optimizer.py prompts/my_prompt.txt --analyze
python scripts/prompt_optimizer.py prompts/my_prompt.txt --analyze

Evaluate RAG retrieval quality

评估RAG检索质量

python scripts/rag_evaluator.py --contexts contexts.json --questions questions.json
python scripts/rag_evaluator.py --contexts contexts.json --questions questions.json

Visualize agent workflow from definition

可视化定义好的Agent工作流

python scripts/agent_orchestrator.py agent_config.yaml --visualize

---
python scripts/agent_orchestrator.py agent_config.yaml --visualize

---

Tools Overview

工具概览

1. Prompt Optimizer

1. 提示词优化器

Analyzes prompts for token efficiency, clarity, and structure. Generates optimized versions.
Input: Prompt text file or string Output: Analysis report with optimization suggestions
Usage:
bash
undefined
分析提示词的Token效率、清晰度和结构,生成优化版本。
输入: 提示词文本文件或字符串 输出: 带有优化建议的分析报告
使用方法:
bash
undefined

Analyze a prompt file

分析提示词文件

python scripts/prompt_optimizer.py prompt.txt --analyze
python scripts/prompt_optimizer.py prompt.txt --analyze

Output:

输出:

Token count: 847

Token数量:847

Estimated cost: $0.0025 (GPT-4)

预估成本:$0.0025 (GPT-4)

Clarity score: 72/100

清晰度评分:72/100

Issues found:

发现的问题:

- Ambiguous instruction at line 3

- 第3行存在模糊指令

- Missing output format specification

- 缺少输出格式规范

- Redundant context (lines 12-15 repeat lines 5-8)

- 冗余上下文(第12-15行重复了第5-8行内容)

Suggestions:

建议:

1. Add explicit output format: "Respond in JSON with keys: ..."

1. 添加明确的输出格式:“以JSON格式响应,包含以下键:...”

2. Remove redundant context to save 89 tokens

2. 删除冗余上下文,可节省89个Token

3. Clarify "analyze" -> "list the top 3 issues with severity ratings"

3. 将“analyze”明确为“列出最严重的3个问题并给出严重等级”

Generate optimized version

生成优化后的版本

python scripts/prompt_optimizer.py prompt.txt --optimize --output optimized.txt
python scripts/prompt_optimizer.py prompt.txt --optimize --output optimized.txt

Count tokens for cost estimation

统计Token数量以估算成本

python scripts/prompt_optimizer.py prompt.txt --tokens --model gpt-4
python scripts/prompt_optimizer.py prompt.txt --tokens --model gpt-4

Extract and manage few-shot examples

提取并管理少样本示例

python scripts/prompt_optimizer.py prompt.txt --extract-examples --output examples.json

---
python scripts/prompt_optimizer.py prompt.txt --extract-examples --output examples.json

---

2. RAG Evaluator

2. RAG评估器

Evaluates Retrieval-Augmented Generation quality by measuring context relevance and answer faithfulness.
Input: Retrieved contexts (JSON) and questions/answers Output: Evaluation metrics and quality report
Usage:
bash
undefined
通过衡量上下文相关性和答案可信度来评估检索增强生成(RAG)的质量。
输入: 检索到的上下文(JSON格式)和问题/答案 输出: 评估指标和质量报告
使用方法:
bash
undefined

Evaluate retrieval quality

评估检索质量

python scripts/rag_evaluator.py --contexts retrieved.json --questions eval_set.json
python scripts/rag_evaluator.py --contexts retrieved.json --questions eval_set.json

Output:

输出:

=== RAG Evaluation Report ===

=== RAG评估报告 ===

Questions evaluated: 50

评估问题数量:50

Retrieval Metrics:

检索指标:

Context Relevance: 0.78 (target: >0.80)

上下文相关性:0.78(目标:>0.80)

Retrieval Precision@5: 0.72

检索准确率@5:0.72

Coverage: 0.85

覆盖率:0.85

Generation Metrics:

生成指标:

Answer Faithfulness: 0.91

答案可信度:0.91

Groundedness: 0.88

事实一致性:0.88

Issues Found:

发现的问题:

- 8 questions had no relevant context in top-5

- 8个问题的前5个检索结果中无相关上下文

- 3 answers contained information not in context

- 3个答案包含上下文以外的信息

Recommendations:

建议:

1. Improve chunking strategy for technical documents

1. 改进技术文档的分块策略

2. Add metadata filtering for date-sensitive queries

2. 为日期敏感型查询添加元数据过滤

Evaluate with custom metrics

使用自定义指标进行评估

python scripts/rag_evaluator.py --contexts retrieved.json --questions eval_set.json
--metrics relevance,faithfulness,coverage
python scripts/rag_evaluator.py --contexts retrieved.json --questions eval_set.json
--metrics relevance,faithfulness,coverage

Export detailed results

导出详细结果

python scripts/rag_evaluator.py --contexts retrieved.json --questions eval_set.json
--output report.json --verbose

---
python scripts/rag_evaluator.py --contexts retrieved.json --questions eval_set.json
--output report.json --verbose

---

3. Agent Orchestrator

3. Agent编排器

Parses agent definitions and visualizes execution flows. Validates tool configurations.
Input: Agent configuration (YAML/JSON) Output: Workflow visualization, validation report
Usage:
bash
undefined
解析Agent定义并可视化执行流程,验证工具配置。
输入: Agent配置文件(YAML/JSON格式) 输出: 工作流可视化图、验证报告
使用方法:
bash
undefined

Validate agent configuration

验证Agent配置

python scripts/agent_orchestrator.py agent.yaml --validate
python scripts/agent_orchestrator.py agent.yaml --validate

Output:

输出:

=== Agent Validation Report ===

=== Agent验证报告 ===

Agent: research_assistant

Agent:research_assistant

Pattern: ReAct

模式:ReAct

Tools (4 registered):

已注册工具(4个):

[OK] web_search - API key configured

[正常] web_search - API密钥已配置

[OK] calculator - No config needed

[正常] calculator - 无需配置

[WARN] file_reader - Missing allowed_paths

[警告] file_reader - 缺少allowed_paths配置

[OK] summarizer - Prompt template valid

[正常] summarizer - 提示词模板有效

Flow Analysis:

流程分析:

Max depth: 5 iterations

最大深度:5次迭代

Estimated tokens/run: 2,400-4,800

每次运行预估Token数:2400-4800

Potential infinite loop: No

潜在无限循环:无

Recommendations:

建议:

1. Add allowed_paths to file_reader for security

1. 为file_reader添加allowed_paths以保障安全

2. Consider adding early exit condition for simple queries

2. 考虑为简单查询添加提前退出条件

Visualize agent workflow (ASCII)

可视化Agent工作流(ASCII格式)

python scripts/agent_orchestrator.py agent.yaml --visualize
python scripts/agent_orchestrator.py agent.yaml --visualize

Output:

输出:

┌─────────────────────────────────────────┐

┌─────────────────────────────────────────┐

│ research_assistant │

│ research_assistant │

│ (ReAct Pattern) │

│ (ReAct Pattern) │

└─────────────────┬───────────────────────┘

└─────────────────┬───────────────────────┘

┌────────▼────────┐

┌────────▼────────┐

│ User Query │

│ 用户查询 │

└────────┬────────┘

└────────┬────────┘

┌────────▼────────┐

┌────────▼────────┐

│ Think │◄──────┐

│ 思考 │◄──────┐

└────────┬────────┘ │

└────────┬────────┘ │

│ │

│ │

┌────────▼────────┐ │

┌────────▼────────┐ │

│ Select Tool │ │

│ 选择工具 │ │

└────────┬────────┘ │

└────────┬────────┘ │

│ │

│ │

┌─────────────┼─────────────┐ │

┌─────────────┼─────────────┐ │

▼ ▼ ▼ │

▼ ▼ ▼ │

[web_search] [calculator] [file_reader]

[web_search] [calculator] [file_reader]

│ │ │ │

│ │ │ │

└─────────────┼─────────────┘ │

└─────────────┼─────────────┘ │

│ │

│ │

┌────────▼────────┐ │

┌────────▼────────┐ │

│ Observe │───────┘

│ 观察结果 │───────┘

└────────┬────────┘

└────────┬────────┘

┌────────▼────────┐

┌────────▼────────┐

│ Final Answer │

│ 最终回复 │

└─────────────────┘

└─────────────────┘

Export workflow as Mermaid diagram

将工作流导出为Mermaid图

python scripts/agent_orchestrator.py agent.yaml --visualize --format mermaid

---
python scripts/agent_orchestrator.py agent.yaml --visualize --format mermaid

---

Prompt Engineering Workflows

提示词工程工作流

Prompt Optimization Workflow

提示词优化工作流

Use when improving an existing prompt's performance or reducing token costs.
Step 1: Baseline current prompt
bash
python scripts/prompt_optimizer.py current_prompt.txt --analyze --output baseline.json
Step 2: Identify issues Review the analysis report for:
  • Token waste (redundant instructions, verbose examples)
  • Ambiguous instructions (unclear output format, vague verbs)
  • Missing constraints (no length limits, no format specification)
Step 3: Apply optimization patterns
IssuePattern to Apply
Ambiguous outputAdd explicit format specification
Too verboseExtract to few-shot examples
Inconsistent resultsAdd role/persona framing
Missing edge casesAdd constraint boundaries
Step 4: Generate optimized version
bash
python scripts/prompt_optimizer.py current_prompt.txt --optimize --output optimized.txt
Step 5: Compare results
bash
python scripts/prompt_optimizer.py optimized.txt --analyze --compare baseline.json
用于提升现有提示词的性能或降低Token成本。
步骤1:建立当前提示词基准
bash
python scripts/prompt_optimizer.py current_prompt.txt --analyze --output baseline.json
步骤2:识别问题 查看分析报告,找出以下问题:
  • Token浪费(冗余指令、冗长示例)
  • 模糊指令(输出格式不明确、动词含义模糊)
  • 缺少约束(无长度限制、无格式规范)
步骤3:应用优化模式
问题适用模式
输出模糊添加明确的格式规范
过于冗长提取为少样本示例
结果不一致添加角色/人设框架
缺少边缘场景处理添加约束边界
步骤4:生成优化版本
bash
python scripts/prompt_optimizer.py current_prompt.txt --optimize --output optimized.txt
步骤5:对比结果
bash
python scripts/prompt_optimizer.py optimized.txt --analyze --compare baseline.json

Shows: token reduction, clarity improvement, issues resolved

显示:Token减少量、清晰度提升情况、已解决的问题


**Step 6: Validate with test cases**
Run both prompts against your evaluation set and compare outputs.

---

**步骤6:用测试用例验证**
将原提示词和优化后的提示词在评估集上运行,对比输出结果。

---

Few-Shot Example Design Workflow

少样本示例设计工作流

Use when creating examples for in-context learning.
Step 1: Define the task clearly
Task: Extract product entities from customer reviews
Input: Review text
Output: JSON with {product_name, sentiment, features_mentioned}
Step 2: Select diverse examples (3-5 recommended)
Example TypePurpose
Simple caseShows basic pattern
Edge caseHandles ambiguity
Complex caseMultiple entities
Negative caseWhat NOT to extract
Step 3: Format consistently
Example 1:
Input: "Love my new iPhone 15, the camera is amazing!"
Output: {"product_name": "iPhone 15", "sentiment": "positive", "features_mentioned": ["camera"]}

Example 2:
Input: "The laptop was okay but battery life is terrible."
Output: {"product_name": "laptop", "sentiment": "mixed", "features_mentioned": ["battery life"]}
Step 4: Validate example quality
bash
python scripts/prompt_optimizer.py prompt_with_examples.txt --validate-examples
用于创建上下文学习所需的示例。
步骤1:明确任务定义
任务:从客户评论中提取产品实体
输入:评论文本
输出:JSON格式,包含{product_name, sentiment, features_mentioned}
步骤2:选择多样化示例(建议3-5个)
示例类型用途
简单场景展示基本模式
边缘场景处理模糊情况
复杂场景包含多个实体
负面示例展示不应提取的内容
步骤3:统一格式
示例1:
输入:“我超爱我的新iPhone 15,相机太棒了!”
输出:{"product_name": "iPhone 15", "sentiment": "positive", "features_mentioned": ["camera"]}

示例2:
输入:“这台笔记本还行,但续航太糟糕了。”
输出:{"product_name": "laptop", "sentiment": "mixed", "features_mentioned": ["battery life"]}
步骤4:验证示例质量
bash
python scripts/prompt_optimizer.py prompt_with_examples.txt --validate-examples

Checks: consistency, coverage, format alignment

检查:一致性、覆盖率、格式对齐情况


**Step 5: Test with held-out cases**
Ensure model generalizes beyond your examples.

---

**步骤5:用预留测试用例验证**
确保模型能够泛化到示例以外的场景。

---

Structured Output Design Workflow

结构化输出设计工作流

Use when you need reliable JSON/XML/structured responses.
Step 1: Define schema
json
{
  "type": "object",
  "properties": {
    "summary": {"type": "string", "maxLength": 200},
    "sentiment": {"enum": ["positive", "negative", "neutral"]},
    "confidence": {"type": "number", "minimum": 0, "maximum": 1}
  },
  "required": ["summary", "sentiment"]
}
Step 2: Include schema in prompt
Respond with JSON matching this schema:
- summary (string, max 200 chars): Brief summary of the content
- sentiment (enum): One of "positive", "negative", "neutral"
- confidence (number 0-1): Your confidence in the sentiment
Step 3: Add format enforcement
IMPORTANT: Respond ONLY with valid JSON. No markdown, no explanation.
Start your response with { and end with }
Step 4: Validate outputs
bash
python scripts/prompt_optimizer.py structured_prompt.txt --validate-schema schema.json

用于需要可靠JSON/XML等结构化响应的场景。
步骤1:定义Schema
json
{
  "type": "object",
  "properties": {
    "summary": {"type": "string", "maxLength": 200},
    "sentiment": {"enum": ["positive", "negative", "neutral"]},
    "confidence": {"type": "number", "minimum": 0, "maximum": 1}
  },
  "required": ["summary", "sentiment"]
}
步骤2:在提示词中包含Schema
请按照以下Schema以JSON格式响应:
- summary(字符串,最大200字符):内容的简要总结
- sentiment(枚举值):“positive”、“negative”、“neutral”其中之一
- confidence(0-1的数字):你对情感判断的置信度
步骤3:添加格式强制要求
重要提示:仅返回有效的JSON,不要使用Markdown,不要添加解释。
请以{开头,以}结尾你的响应。
步骤4:验证输出
bash
python scripts/prompt_optimizer.py structured_prompt.txt --validate-schema schema.json

Reference Documentation

参考文档

FileContainsLoad when user asks about
references/prompt_engineering_patterns.md
10 prompt patterns with input/output examples"which pattern?", "few-shot", "chain-of-thought", "role prompting"
references/llm_evaluation_frameworks.md
Evaluation metrics, scoring methods, A/B testing"how to evaluate?", "measure quality", "compare prompts"
references/agentic_system_design.md
Agent architectures (ReAct, Plan-Execute, Tool Use)"build agent", "tool calling", "multi-agent"

文件内容当用户询问以下内容时调用
references/prompt_engineering_patterns.md
10种提示词模式及输入输出示例“使用哪种模式?”、“少样本”、“思维链”、“角色提示”
references/llm_evaluation_frameworks.md
评估指标、评分方法、A/B测试“如何评估?”、“衡量质量”、“对比提示词”
references/agentic_system_design.md
Agent架构(ReAct、Plan-Execute、工具调用)“构建Agent”、“工具调用”、“多Agent”

Common Patterns Quick Reference

常用模式速查

PatternWhen to UseExample
Zero-shotSimple, well-defined tasks"Classify this email as spam or not spam"
Few-shotComplex tasks, consistent format neededProvide 3-5 examples before the task
Chain-of-ThoughtReasoning, math, multi-step logic"Think step by step..."
Role PromptingExpertise needed, specific perspective"You are an expert tax accountant..."
Structured OutputNeed parseable JSON/XMLInclude schema + format enforcement

模式适用场景示例
零样本简单、定义明确的任务“将这封邮件分类为垃圾邮件或非垃圾邮件”
少样本复杂任务、需要一致格式在任务前提供3-5个示例
思维链推理、数学、多步骤逻辑“请逐步思考...”
角色提示需要专业知识、特定视角“你是一名资深税务会计师...”
结构化输出需要可解析的JSON/XML包含Schema + 格式强制要求

Common Commands

常用命令

bash
undefined
bash
undefined

Prompt Analysis

提示词分析

python scripts/prompt_optimizer.py prompt.txt --analyze # Full analysis python scripts/prompt_optimizer.py prompt.txt --tokens # Token count only python scripts/prompt_optimizer.py prompt.txt --optimize # Generate optimized version
python scripts/prompt_optimizer.py prompt.txt --analyze # 完整分析 python scripts/prompt_optimizer.py prompt.txt --tokens # 仅统计Token数量 python scripts/prompt_optimizer.py prompt.txt --optimize # 生成优化版本

RAG Evaluation

RAG评估

python scripts/rag_evaluator.py --contexts ctx.json --questions q.json # Evaluate python scripts/rag_evaluator.py --contexts ctx.json --compare baseline # Compare to baseline
python scripts/rag_evaluator.py --contexts ctx.json --questions q.json # 进行评估 python scripts/rag_evaluator.py --contexts ctx.json --compare baseline # 与基准版本对比

Agent Development

Agent开发

python scripts/agent_orchestrator.py agent.yaml --validate # Validate config python scripts/agent_orchestrator.py agent.yaml --visualize # Show workflow python scripts/agent_orchestrator.py agent.yaml --estimate-cost # Token estimation
undefined
python scripts/agent_orchestrator.py agent.yaml --validate # 验证配置 python scripts/agent_orchestrator.py agent.yaml --visualize # 展示工作流 python scripts/agent_orchestrator.py agent.yaml --estimate-cost # 估算Token成本
undefined