Loading...
Loading...
Compare original and translation side by side
evals.jsonevals.json/skill-creatorpackages/skill-tools$ARGUMENTS<skill-name>skills/.agents/skills/--allevals.json--all/skill-creatorpackages/skill-tools$ARGUMENTSskills/<name>/.agents/skills/<name>/evals.json/skill-creatorevals.json--allSKILL.mdevals.json$ARGUMENTS<skill-name>skills/.agents/skills/--allevals.json--allskills/<skill-name>/.workspace/.agents/skills/<skill-name>/.workspace/1iteration-N/max(N) + 1skills/<skill-name>/.workspace/iteration-<N>/.workspace/$ARGUMENTSskills/<name>/.agents/skills/<name>/evals.json/skill-creatorevals.json--allSKILL.mdevals.jsonevals.jsonskills/<skill-name>/.workspace/.agents/skills/<skill-name>/.workspace/1iteration-N/max(N) + 1skills/<skill-name>/.workspace/iteration-<N>/.workspace/subagent_type: general-purposeExecute this task exactly:
[eval.prompt]
No skill is loaded for this task. After attempting it, report what you did,
what decisions you made and why, and anything you found tricky. Report
verbatim — do not polish, do not summarize. Include any code you wrote
inline so it can be analyzed.skills/<skill-name>/.workspace/iteration-<N>/eval-<id>-<eval_name>/without_skill/transcript.mdevals.jsonExecute this task exactly:
[eval.prompt]
The skill `<skill-name>` is available — apply its rules and patterns.
After attempting it, report what you did, what decisions you made and why,
and anything you found tricky. Report verbatim — do not polish, do not
summarize. Include any code you wrote inline.
If you considered skipping any rule from the skill, capture the exact
reasoning verbatim — that's the kind of failure mode the skill needs to
catch.skills/<skill-name>/.workspace/iteration-<N>/eval-<id>-<eval_name>/with_skill/transcript.mdsubagent_type: general-purposeExecute this task exactly:
[eval.prompt]
No skill is loaded for this task. After attempting it, report what you did,
what decisions you made and why, and anything you found tricky. Report
verbatim — do not polish, do not summarize. Include any code you wrote
inline so it can be analyzed.skills/<skill-name>/.workspace/iteration-<N>/eval-<id>-<eval_name>/without_skill/transcript.mdskill-tools evalnode packages/skill-tools/dist/index.mjs eval <skill-name> <eval.id> \
--variant <with_skill|without_skill> \
--iteration <N> \
--transcript <path-to-transcript.md>grading.jsonExecute this task exactly:
[eval.prompt]
The skill `<skill-name>` is available — apply its rules and patterns.
After attempting it, report what you did, what decisions you made and why,
and anything you found tricky. Report verbatim — do not polish, do not
summarize. Include any code you wrote inline.
If you considered skipping any rule from the skill, capture the exact
reasoning verbatim — that's the kind of failure mode the skill needs to
catch.skills/<skill-name>/.workspace/iteration-<N>/eval-<id>-<eval_name>/with_skill/transcript.mdnode packages/skill-tools/dist/index.mjs benchmark <skill-name>benchmark.jsonbenchmark.mdskill-tools evalnode packages/skill-tools/dist/index.mjs eval <skill-name> <eval.id> \
--variant <with_skill|without_skill> \
--iteration <N> \
--transcript <path-to-transcript.md>grading.jsonpnpm skill-tools view <skill-name>node packages/skill-tools/dist/index.mjs benchmark <skill-name>grading.jsonbenchmark.jsonbenchmark.mdpnpm skill-tools view <skill-name>evals.jsonskills/skill-creator/references/evals-json.mdpackages/skill-tools/evals.jsonskills/skill-creator/references/evals-json.mdpackages/skill-tools/