Loading...
Loading...
Use when the user needs to build AI agents — tool use patterns, memory management, planning strategies, multi-agent coordination, evaluation, and safety guardrails. Triggers: user says "agent", "build an agent", "tool use", "agent loop", "multi-agent", "memory management", "guardrails", "agent evaluation".
npx skill4agent add pixel-process-ug/superkit-agents agent-development| Agent Type | When to Use | Loop Pattern | Complexity |
|---|---|---|---|
| Single-turn tool user | Simple queries with tool calls | Request -> Tool -> Response | Low |
| ReAct agent | Multi-step reasoning tasks | Thought -> Action -> Observation -> loop | Medium |
| Plan-and-execute | Complex tasks with dependencies | Plan -> Execute steps -> Validate | Medium-High |
| Multi-agent orchestrator | Parallel/specialized sub-tasks | Dispatch -> Collect -> Synthesize | High |
| Autonomous loop (Ralph-style) | Long-running iterative development | Plan -> Build -> Verify -> Exit gate | High |
| Principle | Rule | Example |
|---|---|---|
| Clear naming | verb-noun format | |
| Detailed descriptions | Include when to use AND when NOT to use | "Use for keyword search. Do NOT use for semantic similarity." |
| Well-typed parameters | Descriptions and examples on every param | |
| Predictable returns | Consistent format across tools | Always return |
| Self-correcting errors | Help agent recover | "Invalid date format. Expected ISO 8601: YYYY-MM-DD" |
Given a task:
1. Identify required information and actions
2. Map to available tools
3. Determine tool call order (dependencies)
4. Execute with result validation
5. Retry or try alternative tool on failure| Type | Duration | Storage | Use Case |
|---|---|---|---|
| Working Memory | Current turn | Context window | Active reasoning |
| Short-term Memory | Current session | In-context or buffer | Recent conversation |
| Long-term Memory | Across sessions | Database/file | Learned patterns, user prefs |
| Episodic Memory | Specific events | Indexed store | Past task outcomes |
| Semantic Memory | Knowledge | Vector DB | Domain knowledge retrieval |
Strategy: Sliding window with importance-based retention
1. Always retain: system prompt, tool definitions, current task
2. Summarize: older conversation turns into compressed summaries
3. Evict: least relevant context when approaching limit
4. Retrieve: pull relevant long-term memory on demand
Budget allocation:
System prompt + tools: ~20%
Current task context: ~40%
Conversation history: ~25%
Retrieved memory: ~15%| Trigger | Action |
|---|---|
| User correction | Update learned patterns |
| Task completion | Store outcome and approach |
| Error recovery | Record what failed and what worked |
| New domain knowledge | Index for future retrieval |
1. Break high-level goal into sub-goals
2. For each sub-goal, identify required actions
3. Order actions by dependencies
4. Execute with checkpoints between phases
5. Re-plan if intermediate results change the approachThought: I need to find the user's recent orders to answer their question.
Action: search_orders(user_id="123", limit=5)
Observation: Found 5 orders, most recent is #456 from yesterday.
Thought: The user asked about order #456. I have the details now.
Action: respond with order details1. Create a complete plan before any action
2. Execute each step, checking preconditions
3. After each step, validate the result
4. If a step fails, re-plan from current state
5. Never modify the plan mid-step (finish or abort first)After completing a task:
1. Was the result correct?
2. Was the approach efficient?
3. What could be improved?
4. Should any memory be updated?| Pattern | Description | Use When |
|---|---|---|
| Orchestrator | Central agent delegates to specialists | Clear task hierarchy |
| Pipeline | Agents process in sequence | Linear workflows |
| Debate | Agents propose and critique | Need diverse perspectives |
| Voting | Multiple agents, majority wins | Uncertainty in approach |
| Supervisor | One agent monitors others | Safety-critical tasks |
Agent-to-Agent message:
{
"from": "planner",
"to": "executor",
"type": "task_assignment",
"content": { "task": "...", "context": "...", "constraints": "..." },
"priority": "high",
"deadline": "2025-01-15T10:00:00Z"
}| Metric | What It Measures | How to Measure | Target |
|---|---|---|---|
| Task Success Rate | Correct completions / total | Automated + human eval | > 90% |
| Efficiency | Steps vs optimal path | Step count comparison | < 2x optimal |
| Tool Accuracy | Correct tool calls / total | Log analysis | > 95% |
| Safety | Violations / total interactions | Guardrail checks | 0 violations |
| Latency | Time to complete task | Wall clock | < SLA |
| Cost | Token usage per task | API usage tracking | Within budget |
{
"test_cases": [
{
"id": "tc_001",
"input": "Find all orders over $100 from last week",
"expected_tools": ["search_orders"],
"expected_output_contains": ["order_id", "amount"],
"category": "retrieval",
"difficulty": "easy"
}
]
}| Condition | Threshold | Action |
|---|---|---|
| Max tool calls per task | 20 | Stop execution, return error |
| Max consecutive errors | 3 | Stop, log, return graceful error |
| Max task duration | 5 minutes | Timeout, return partial result |
| Max tokens generated | 10,000 | Stop generation |
| Pattern repeats | 5 identical errors | Open circuit, alert operator |
1. Identity and purpose (who the agent is)
2. Available tools (what it can do)
3. Constraints (what it must not do)
4. Output format (how to respond)
5. Examples (few-shot demonstrations)
6. Error handling (what to do when stuck)| Anti-Pattern | Why It Is Wrong | What to Do Instead |
|---|---|---|
| Calling tools without reasoning | Wastes calls, misses context | Use ReAct pattern (think first) |
| No max iteration limit | Infinite loops, runaway costs | Set circuit breaker thresholds |
| Trusting all tool outputs | Corrupted data propagates | Validate tool results |
| Hardcoded tool sequences | No adaptability to failures | Dynamic tool selection based on state |
| No error recovery strategy | Agent gets stuck on first failure | Implement retry with alternatives |
| Apologizing instead of acting | Wastes user time | Take corrective action, then report |
| Over-reliance on single tool | Fragile if that tool fails | Provide fallback tools |
| No evaluation framework | Shipping blind, no quality signal | Build eval harness before deployment |
| Unlimited context growth | Context overflow, degraded quality | Implement memory management |
| Skill | Integration |
|---|---|
| MCP servers provide tools for agents |
| Agent planning uses structured plan generation |
| Ralph-style loops are a specialized agent pattern |
| Multi-agent coordination pattern |
| Operational safety for agent loops |
| Agent output validation |
| TDD for agent tool implementations |