Senior Prompt Engineer

资深提示词工程师

Prompt engineering patterns, LLM evaluation frameworks, and agentic system design.

提示词工程模式、LLM评估框架与Agent系统设计。

Quick Start

快速开始

bash

undefined

bash

undefined

Analyze and optimize a prompt file

分析并优化提示词文件

python scripts/prompt_optimizer.py prompts/my_prompt.txt --analyze

Evaluate RAG retrieval quality

评估RAG检索质量

python scripts/rag_evaluator.py --contexts contexts.json --questions questions.json

Visualize agent workflow from definition

可视化定义好的Agent工作流

python scripts/agent_orchestrator.py agent_config.yaml --visualize

---

python scripts/agent_orchestrator.py agent_config.yaml --visualize

---

Tools Overview

工具概览

1. Prompt Optimizer

1. 提示词优化器

Analyzes prompts for token efficiency, clarity, and structure. Generates optimized versions.

Input: Prompt text file or string Output: Analysis report with optimization suggestions

Usage:

bash

undefined

分析提示词的Token效率、清晰度和结构，生成优化版本。

输入： 提示词文本文件或字符串 输出： 带有优化建议的分析报告

使用方法：

bash

undefined

Analyze a prompt file

分析提示词文件

python scripts/prompt_optimizer.py prompt.txt --analyze

Output:

输出：

Token count: 847

Token数量：847

Estimated cost: $0.0025 (GPT-4)

预估成本：$0.0025 (GPT-4)

Clarity score: 72/100

清晰度评分：72/100

Issues found:

发现的问题：

- Ambiguous instruction at line 3

- 第3行存在模糊指令

- Missing output format specification

- 缺少输出格式规范

- Redundant context (lines 12-15 repeat lines 5-8)

- 冗余上下文（第12-15行重复了第5-8行内容）

Suggestions:

建议：

1. Add explicit output format: "Respond in JSON with keys: ..."

1. 添加明确的输出格式：“以JSON格式响应，包含以下键：...”

2. Remove redundant context to save 89 tokens

2. 删除冗余上下文，可节省89个Token

3. Clarify "analyze" -> "list the top 3 issues with severity ratings"

3. 将“analyze”明确为“列出最严重的3个问题并给出严重等级”

Generate optimized version

生成优化后的版本

python scripts/prompt_optimizer.py prompt.txt --optimize --output optimized.txt

Count tokens for cost estimation

统计Token数量以估算成本

python scripts/prompt_optimizer.py prompt.txt --tokens --model gpt-4

Extract and manage few-shot examples

提取并管理少样本示例

python scripts/prompt_optimizer.py prompt.txt --extract-examples --output examples.json

---

python scripts/prompt_optimizer.py prompt.txt --extract-examples --output examples.json

---

2. RAG Evaluator

2. RAG评估器

Evaluates Retrieval-Augmented Generation quality by measuring context relevance and answer faithfulness.

Input: Retrieved contexts (JSON) and questions/answers Output: Evaluation metrics and quality report

Usage:

bash

undefined

通过衡量上下文相关性和答案可信度来评估检索增强生成（RAG）的质量。

输入： 检索到的上下文（JSON格式）和问题/答案 输出： 评估指标和质量报告

使用方法：

bash

undefined

Evaluate retrieval quality

评估检索质量

python scripts/rag_evaluator.py --contexts retrieved.json --questions eval_set.json

Output:

输出：

=== RAG Evaluation Report ===

=== RAG评估报告 ===

Questions evaluated: 50

评估问题数量：50

Retrieval Metrics:

检索指标：

Context Relevance: 0.78 (target: >0.80)

上下文相关性：0.78（目标：>0.80）

Retrieval Precision@5: 0.72

检索准确率@5：0.72

Coverage: 0.85

覆盖率：0.85

Generation Metrics:

生成指标：

Answer Faithfulness: 0.91

答案可信度：0.91

Groundedness: 0.88

事实一致性：0.88

Issues Found:

发现的问题：

- 8 questions had no relevant context in top-5

- 8个问题的前5个检索结果中无相关上下文

- 3 answers contained information not in context

- 3个答案包含上下文以外的信息

Recommendations:

建议：

1. Improve chunking strategy for technical documents

1. 改进技术文档的分块策略

2. Add metadata filtering for date-sensitive queries

2. 为日期敏感型查询添加元数据过滤

Evaluate with custom metrics

使用自定义指标进行评估

python scripts/rag_evaluator.py --contexts retrieved.json --questions eval_set.json
--metrics relevance,faithfulness,coverage

Export detailed results

导出详细结果

python scripts/rag_evaluator.py --contexts retrieved.json --questions eval_set.json
--output report.json --verbose

---

python scripts/rag_evaluator.py --contexts retrieved.json --questions eval_set.json
--output report.json --verbose

---

3. Agent Orchestrator

3. Agent编排器

Parses agent definitions and visualizes execution flows. Validates tool configurations.

Input: Agent configuration (YAML/JSON) Output: Workflow visualization, validation report

Usage:

bash

undefined

解析Agent定义并可视化执行流程，验证工具配置。

输入： Agent配置文件（YAML/JSON格式） 输出： 工作流可视化图、验证报告

使用方法：

bash

undefined

Validate agent configuration

验证Agent配置

python scripts/agent_orchestrator.py agent.yaml --validate

Output:

输出：

=== Agent Validation Report ===

=== Agent验证报告 ===

Agent: research_assistant

Agent：research_assistant

Pattern: ReAct

模式：ReAct

Tools (4 registered):

已注册工具（4个）：

[OK] web_search - API key configured

[正常] web_search - API密钥已配置

[OK] calculator - No config needed

[正常] calculator - 无需配置

[WARN] file_reader - Missing allowed_paths

[警告] file_reader - 缺少allowed_paths配置

[OK] summarizer - Prompt template valid

[正常] summarizer - 提示词模板有效

Flow Analysis:

流程分析：

Max depth: 5 iterations

最大深度：5次迭代

Estimated tokens/run: 2,400-4,800

每次运行预估Token数：2400-4800

Potential infinite loop: No

潜在无限循环：无

Recommendations:

建议：

1. Add allowed_paths to file_reader for security

1. 为file_reader添加allowed_paths以保障安全

2. Consider adding early exit condition for simple queries

2. 考虑为简单查询添加提前退出条件

Visualize agent workflow (ASCII)

可视化Agent工作流（ASCII格式）

python scripts/agent_orchestrator.py agent.yaml --visualize

Output:

输出：

┌─────────────────────────────────────────┐

│ research_assistant │

│ (ReAct Pattern) │

└─────────────────┬───────────────────────┘

│

┌────────▼────────┐

│ User Query │

│ 用户查询 │

└────────┬────────┘

│

┌────────▼────────┐

│ Think │◄──────┐

│ 思考 │◄──────┐

└────────┬────────┘ │

│ │

┌────────▼────────┐ │

│ Select Tool │ │

│ 选择工具 │ │

└────────┬────────┘ │

│ │

┌─────────────┼─────────────┐ │

▼ ▼ ▼ │

[web_search] [calculator] [file_reader]

│ │ │ │

└─────────────┼─────────────┘ │

│ │

┌────────▼────────┐ │

│ Observe │───────┘

│ 观察结果 │───────┘

└────────┬────────┘

│

┌────────▼────────┐

│ Final Answer │

│ 最终回复 │

└─────────────────┘

Export workflow as Mermaid diagram

将工作流导出为Mermaid图

python scripts/agent_orchestrator.py agent.yaml --visualize --format mermaid

---

python scripts/agent_orchestrator.py agent.yaml --visualize --format mermaid

---

Prompt Engineering Workflows

提示词工程工作流

Prompt Optimization Workflow

提示词优化工作流

Use when improving an existing prompt's performance or reducing token costs.

Step 1: Baseline current prompt

bash

python scripts/prompt_optimizer.py current_prompt.txt --analyze --output baseline.json

Step 2: Identify issues Review the analysis report for:

Token waste (redundant instructions, verbose examples)
Ambiguous instructions (unclear output format, vague verbs)
Missing constraints (no length limits, no format specification)

Step 3: Apply optimization patterns

Issue	Pattern to Apply
Ambiguous output	Add explicit format specification
Too verbose	Extract to few-shot examples
Inconsistent results	Add role/persona framing
Missing edge cases	Add constraint boundaries

Step 4: Generate optimized version

bash

python scripts/prompt_optimizer.py current_prompt.txt --optimize --output optimized.txt

Step 5: Compare results

bash

python scripts/prompt_optimizer.py optimized.txt --analyze --compare baseline.json

用于提升现有提示词的性能或降低Token成本。

步骤1：建立当前提示词基准

bash

python scripts/prompt_optimizer.py current_prompt.txt --analyze --output baseline.json

步骤2：识别问题 查看分析报告，找出以下问题：

Token浪费（冗余指令、冗长示例）
模糊指令（输出格式不明确、动词含义模糊）
缺少约束（无长度限制、无格式规范）

步骤3：应用优化模式

问题	适用模式
输出模糊	添加明确的格式规范
过于冗长	提取为少样本示例
结果不一致	添加角色/人设框架
缺少边缘场景处理	添加约束边界

步骤4：生成优化版本

bash

python scripts/prompt_optimizer.py current_prompt.txt --optimize --output optimized.txt

步骤5：对比结果

bash

python scripts/prompt_optimizer.py optimized.txt --analyze --compare baseline.json

Shows: token reduction, clarity improvement, issues resolved

显示：Token减少量、清晰度提升情况、已解决的问题


**Step 6: Validate with test cases**
Run both prompts against your evaluation set and compare outputs.

---


**步骤6：用测试用例验证**
将原提示词和优化后的提示词在评估集上运行，对比输出结果。

---

Few-Shot Example Design Workflow

少样本示例设计工作流

Use when creating examples for in-context learning.

Step 1: Define the task clearly

Task: Extract product entities from customer reviews
Input: Review text
Output: JSON with {product_name, sentiment, features_mentioned}

Step 2: Select diverse examples (3-5 recommended)

Example Type	Purpose
Simple case	Shows basic pattern
Edge case	Handles ambiguity
Complex case	Multiple entities
Negative case	What NOT to extract

Step 3: Format consistently

Example 1:
Input: "Love my new iPhone 15, the camera is amazing!"
Output: {"product_name": "iPhone 15", "sentiment": "positive", "features_mentioned": ["camera"]}

Example 2:
Input: "The laptop was okay but battery life is terrible."
Output: {"product_name": "laptop", "sentiment": "mixed", "features_mentioned": ["battery life"]}

Step 4: Validate example quality

bash

python scripts/prompt_optimizer.py prompt_with_examples.txt --validate-examples

用于创建上下文学习所需的示例。

步骤1：明确任务定义

任务：从客户评论中提取产品实体
输入：评论文本
输出：JSON格式，包含{product_name, sentiment, features_mentioned}

步骤2：选择多样化示例（建议3-5个）

示例类型	用途
简单场景	展示基本模式
边缘场景	处理模糊情况
复杂场景	包含多个实体
负面示例	展示不应提取的内容

步骤3：统一格式

示例1：
输入：“我超爱我的新iPhone 15，相机太棒了！”
输出：{"product_name": "iPhone 15", "sentiment": "positive", "features_mentioned": ["camera"]}

示例2：
输入：“这台笔记本还行，但续航太糟糕了。”
输出：{"product_name": "laptop", "sentiment": "mixed", "features_mentioned": ["battery life"]}

步骤4：验证示例质量

bash

python scripts/prompt_optimizer.py prompt_with_examples.txt --validate-examples

Checks: consistency, coverage, format alignment

检查：一致性、覆盖率、格式对齐情况


**Step 5: Test with held-out cases**
Ensure model generalizes beyond your examples.

---


**步骤5：用预留测试用例验证**
确保模型能够泛化到示例以外的场景。

---

Structured Output Design Workflow

结构化输出设计工作流

Use when you need reliable JSON/XML/structured responses.

Step 1: Define schema

json

{
  "type": "object",
  "properties": {
    "summary": {"type": "string", "maxLength": 200},
    "sentiment": {"enum": ["positive", "negative", "neutral"]},
    "confidence": {"type": "number", "minimum": 0, "maximum": 1}
  },
  "required": ["summary", "sentiment"]
}

Step 2: Include schema in prompt

Respond with JSON matching this schema:
- summary (string, max 200 chars): Brief summary of the content
- sentiment (enum): One of "positive", "negative", "neutral"
- confidence (number 0-1): Your confidence in the sentiment

Step 3: Add format enforcement

IMPORTANT: Respond ONLY with valid JSON. No markdown, no explanation.
Start your response with { and end with }

Step 4: Validate outputs

bash

python scripts/prompt_optimizer.py structured_prompt.txt --validate-schema schema.json

用于需要可靠JSON/XML等结构化响应的场景。

步骤1：定义Schema

json

{
  "type": "object",
  "properties": {
    "summary": {"type": "string", "maxLength": 200},
    "sentiment": {"enum": ["positive", "negative", "neutral"]},
    "confidence": {"type": "number", "minimum": 0, "maximum": 1}
  },
  "required": ["summary", "sentiment"]
}

步骤2：在提示词中包含Schema

请按照以下Schema以JSON格式响应：
- summary（字符串，最大200字符）：内容的简要总结
- sentiment（枚举值）：“positive”、“negative”、“neutral”其中之一
- confidence（0-1的数字）：你对情感判断的置信度

步骤3：添加格式强制要求

重要提示：仅返回有效的JSON，不要使用Markdown，不要添加解释。
请以{开头，以}结尾你的响应。

步骤4：验证输出

bash

python scripts/prompt_optimizer.py structured_prompt.txt --validate-schema schema.json

Reference Documentation

参考文档

File	Contains	Load when user asks about
`references/prompt_engineering_patterns.md`	10 prompt patterns with input/output examples	"which pattern?", "few-shot", "chain-of-thought", "role prompting"
`references/llm_evaluation_frameworks.md`	Evaluation metrics, scoring methods, A/B testing	"how to evaluate?", "measure quality", "compare prompts"
`references/agentic_system_design.md`	Agent architectures (ReAct, Plan-Execute, Tool Use)	"build agent", "tool calling", "multi-agent"

文件	内容	当用户询问以下内容时调用
`references/prompt_engineering_patterns.md`	10种提示词模式及输入输出示例	“使用哪种模式？”、“少样本”、“思维链”、“角色提示”
`references/llm_evaluation_frameworks.md`	评估指标、评分方法、A/B测试	“如何评估？”、“衡量质量”、“对比提示词”
`references/agentic_system_design.md`	Agent架构（ReAct、Plan-Execute、工具调用）	“构建Agent”、“工具调用”、“多Agent”

Common Patterns Quick Reference

常用模式速查

Pattern	When to Use	Example
Zero-shot	Simple, well-defined tasks	"Classify this email as spam or not spam"
Few-shot	Complex tasks, consistent format needed	Provide 3-5 examples before the task
Chain-of-Thought	Reasoning, math, multi-step logic	"Think step by step..."
Role Prompting	Expertise needed, specific perspective	"You are an expert tax accountant..."
Structured Output	Need parseable JSON/XML	Include schema + format enforcement

模式	适用场景	示例
零样本	简单、定义明确的任务	“将这封邮件分类为垃圾邮件或非垃圾邮件”
少样本	复杂任务、需要一致格式	在任务前提供3-5个示例
思维链	推理、数学、多步骤逻辑	“请逐步思考...”
角色提示	需要专业知识、特定视角	“你是一名资深税务会计师...”
结构化输出	需要可解析的JSON/XML	包含Schema + 格式强制要求

Common Commands

常用命令

bash

undefined

bash

undefined

Prompt Analysis

提示词分析

python scripts/prompt_optimizer.py prompt.txt --analyze # Full analysis python scripts/prompt_optimizer.py prompt.txt --tokens # Token count only python scripts/prompt_optimizer.py prompt.txt --optimize # Generate optimized version

python scripts/prompt_optimizer.py prompt.txt --analyze # 完整分析 python scripts/prompt_optimizer.py prompt.txt --tokens # 仅统计Token数量 python scripts/prompt_optimizer.py prompt.txt --optimize # 生成优化版本

RAG Evaluation

RAG评估

python scripts/rag_evaluator.py --contexts ctx.json --questions q.json # Evaluate python scripts/rag_evaluator.py --contexts ctx.json --compare baseline # Compare to baseline

python scripts/rag_evaluator.py --contexts ctx.json --questions q.json # 进行评估 python scripts/rag_evaluator.py --contexts ctx.json --compare baseline # 与基准版本对比

Agent Development

Agent开发

python scripts/agent_orchestrator.py agent.yaml --validate # Validate config python scripts/agent_orchestrator.py agent.yaml --visualize # Show workflow python scripts/agent_orchestrator.py agent.yaml --estimate-cost # Token estimation

undefined

python scripts/agent_orchestrator.py agent.yaml --validate # 验证配置 python scripts/agent_orchestrator.py agent.yaml --visualize # 展示工作流 python scripts/agent_orchestrator.py agent.yaml --estimate-cost # 估算Token成本

undefined

senior-prompt-engineer

Original

Translation

Senior Prompt Engineer

资深提示词工程师

Table of Contents

目录

Quick Start

快速开始

Analyze and optimize a prompt file

分析并优化提示词文件

Evaluate RAG retrieval quality

评估RAG检索质量

Visualize agent workflow from definition

可视化定义好的Agent工作流

Tools Overview

工具概览

1. Prompt Optimizer

1. 提示词优化器

Analyze a prompt file

分析提示词文件

Output:

输出：

Token count: 847

Token数量：847

Estimated cost: $0.0025 (GPT-4)

预估成本：$0.0025 (GPT-4)

Clarity score: 72/100

清晰度评分：72/100

Issues found:

发现的问题：

- Ambiguous instruction at line 3

- 第3行存在模糊指令

- Missing output format specification

- 缺少输出格式规范

- Redundant context (lines 12-15 repeat lines 5-8)

- 冗余上下文（第12-15行重复了第5-8行内容）

Suggestions:

建议：

1. Add explicit output format: "Respond in JSON with keys: ..."

1. 添加明确的输出格式：“以JSON格式响应，包含以下键：...”

2. Remove redundant context to save 89 tokens

2. 删除冗余上下文，可节省89个Token

3. Clarify "analyze" -> "list the top 3 issues with severity ratings"

3. 将“analyze”明确为“列出最严重的3个问题并给出严重等级”

Generate optimized version

生成优化后的版本

Count tokens for cost estimation

统计Token数量以估算成本

Extract and manage few-shot examples

提取并管理少样本示例

2. RAG Evaluator

2. RAG评估器

Evaluate retrieval quality

评估检索质量

Output:

输出：

=== RAG Evaluation Report ===

=== RAG评估报告 ===

Questions evaluated: 50

评估问题数量：50

Retrieval Metrics:

检索指标：

Context Relevance: 0.78 (target: >0.80)

上下文相关性：0.78（目标：>0.80）

Retrieval Precision@5: 0.72

检索准确率@5：0.72

Coverage: 0.85

覆盖率：0.85

Generation Metrics:

生成指标：

Answer Faithfulness: 0.91

答案可信度：0.91

Groundedness: 0.88

事实一致性：0.88

Issues Found:

发现的问题：

- 8 questions had no relevant context in top-5

- 8个问题的前5个检索结果中无相关上下文

- 3 answers contained information not in context