accelerate
Original:🇺🇸 English
Translated
Use when the workflow is too slow, too expensive, or both and needs latency, cost, or token usage optimization.
10installs
Sourcesharpdeveye/maestro
Added on
NPX Install
npx skill4agent add sharpdeveye/maestro accelerateTags
Translated version includes tags in frontmatterSKILL.md Content
View Translation Comparison →MANDATORY PREPARATION
Invoke {{command_prefix}}agent-workflow — it contains workflow principles, anti-patterns, and the Context Gathering Protocol. Follow the protocol before proceeding — if no workflow context exists yet, you MUST run {{command_prefix}}teach-maestro first.
Consult the context-management reference in the agent-workflow skill for window optimization and budget strategies.
Make the workflow faster and cheaper without sacrificing quality. Measure before and after.
Performance Audit
Measure current performance:
text
Current metrics:
Latency (p50): ___ms
Latency (p95): ___ms
Cost per request: $___
Token usage (avg): ___ input / ___ output
Error rate: ___%Acceleration Strategies
Reduce Token Usage
- Shorten system prompts (remove redundant instructions)
- Compress few-shot examples to minimum viable length
- Use structured output schemas instead of verbose text
- Summarize context instead of passing raw documents
- Reduce output length requirements
Model Cascading
- Route simple tasks to cheaper/faster models
- Escalate only complex tasks to capable models
- Use classification to determine complexity
Caching
- Cache responses for identical or near-identical inputs
- Cache tool results with appropriate TTL
- Cache embeddings for frequently-queried documents
- Use semantic caching for similar (not identical) queries
Parallelization
- Run independent tool calls in parallel
- Run independent agent steps in parallel
- Use streaming to start processing before full response
Context Optimization
- Retrieve less, retrieve better (improve retrieval precision)
- Use context compression techniques
- Implement sliding window for long conversations
Acceleration Report
For each optimization:
- What changed: Specific modification
- Before: Latency/cost/tokens before
- After: Latency/cost/tokens after
- Quality impact: Any quality change (verify with golden tests)
- Trade-off: What was sacrificed for the improvement
Acceleration Checklist
- Baseline metrics recorded before any changes
- Each optimization measured with before/after comparison
- Quality impact verified (golden tests still pass)
- Trade-offs documented for each change
- Cost/latency improvements quantified
Recommended Next Step
After optimization, run to verify quality didn't degrade, or to set up continuous monitoring.
{{command_prefix}}evaluate{{command_prefix}}iterateNEVER:
- Optimize without measuring first (you need a baseline)
- Sacrifice quality for speed without explicit user approval
- Cache outputs that depend on real-time data
- Skip the quality check after optimization
- Optimize prematurely (make it correct first, then make it fast)