Loading...
Loading...
Build and run evaluators for AI/LLM applications using Phoenix.
npx skill4agent add arize-ai/phoenix phoenix-evals| Task | Files |
|---|---|
| Setup | |
| Build code evaluator | |
| Build LLM evaluator | |
| Run experiment | |
| Create dataset | |
| Validate evaluator | |
| Analyze errors | |
| RAG evals | |
| Production | |
observe-tracing-setuperror-analysisaxial-codingevaluators-overviewfundamentalsevaluators-{code\|llm}-{python\|typescript}validation-calibration-{python\|typescript}evaluators-ragevaluators-code-*evaluators-llm-*production-overviewproduction-guardrailsproduction-continuous| Prefix | Description |
|---|---|
| Types, scores, anti-patterns |
| Tracing, sampling |
| Finding failures |
| Categorizing failures |
| Code, LLM, RAG evaluators |
| Datasets, running experiments |
| Calibrating judges |
| CI/CD, monitoring |
| Principle | Action |
|---|---|
| Error analysis first | Can't automate what you haven't observed |
| Custom > generic | Build from your failures |
| Code first | Deterministic before LLM |
| Validate judges | >80% TPR/TNR |
| Binary > Likert | Pass/fail, not 1-5 |