Loading...
Loading...
Use when you want rubric based LLM quality scoring on generated outputs; pair with addon-deterministic-eval-suite.
npx skill4agent add ajrlewis/ai-skills addon-llm-judge-evalsaddon-deterministic-eval-suiteJUDGE_BACKENDautolangchaingoogle-adkautoJUDGE_MODELJUDGE_TIMEOUT_SECONDS60JUDGE_MAX_RETRIES2JUDGE_TEMPERATURE0JUDGE_FAIL_ON_BACKEND_MISMATCHyesnoyesJUDGE_RUBRIC_MODEproductsecuritydeveloper-experiencecustomPASS_THRESHOLD0.75BLOCK_ON_JUDGE_FAILyesnonoconfig/skill_manifest.json
evals/judge/rubric.md
evals/judge/cases/
scripts/evals/run_llm_judge.py
.github/workflows/evals-judge.yml
REVIEW_BUNDLE/JUDGE_REPORT.mdscripts/run_llm_judge.pyscripts/evals/run_llm_judge.pyconfig/skill_manifest.jsonBLOCK_ON_JUDGE_FAIL=noscripts/evals/run_llm_judge.pyconfig/skill_manifest.json{
"base_skill": "architect-python-uv-fastapi-sqlalchemy",
"addons": [
"addon-deterministic-eval-suite",
"addon-llm-judge-evals",
"addon-langchain-llm"
],
"capabilities": {
"judge_backends": ["langchain"]
}
}JUDGE_BACKEND != autoJUDGE_BACKEND=autoaddon-langchain-llmlangchainJUDGE_BACKEND=autoaddon-google-agent-dev-kitgoogle-adkJUDGE_BACKENDJUDGE_MODELlangchainDEFAULT_MODELgoogle-adkADK_DEFAULT_MODELJudgeBackend.score(prompt)evals/judge/rubric.md# Judge Rubric
- Technical coherence (0-1)
- Requirement coverage (0-1)
- Domain language alignment (0-1)
- UX quality and states (0-1)
- Documentation clarity (0-1)
Pass threshold: 0.75JUDGE_BACKENDtest -f evals/judge/rubric.md
test -f scripts/evals/run_llm_judge.py
test -f .github/workflows/evals-judge.yml
test -f REVIEW_BUNDLE/JUDGE_REPORT.md || true