Organize ML Workspace
Where things live, when to create a new file, what each file is
allowed to contain.
Next-step pointers — where you go after this skill
| You came here for… | → next |
|---|
| Bootstrap a fresh workspace | → § Bootstrap; then § 0 |
| First experiment script | → § 0 for the design note |
| Add a new experiment iteration | → § 1 (new vs edit decision) |
| Pipeline / evaluate / smoke-test content | → / / |
Always re-emit the Pre-flight checklist with evidence before
declaring the turn done.
Sibling skills — open just-in-time
Don't pre-read all nine at session start (paralysis). Open each
sibling SKILL.md when a step calls for it (e.g. open
before G-ENV-MGR; open
before handing off the design-note write). Emit this tracker once
per turn:
Sibling skills (just-in-time):
- data-science-python-stack, python-env-manager, python-api,
python-code-style, iterate-ml-experiment, build-ml-pipeline,
evaluate-ml-pipeline, test-ml-pipeline, smoke-test-ml-pipeline
Stop conditions — read before anything else
- Missing dependency. If raises, STOP. Invoke
for the install command. Do not drop
in favor of / pickles / "print metrics"
— the workspace contract assumes a Project on disk.
- Symbol from memory is forbidden. Any /
/ signature must come from a
call this turn.
- Existing layout wins — detect first. Run the Detection table
before scaffolding. Don't rename, relocate, or "tidy up"
existing folders.
- Notebooks are not silent. Existing files in the
experiment folder → surface the convention shift and ask. Don't
auto-convert.
- Scratch is read-only against the skore Project. Probes under
may call ,
, walk an existing report. They MUST NOT
call or . When
raises , the fix is the lookup
shape: is by id, not by . Use →
→ . Never substitute by re-running
+ .
- Tabular library is asked, not assumed (G-TABULAR). Pandas
being importable via skore is not a pick. Invoke
data-science-python-stack
for the structured ask. Free-text
("quick", "you pick") does NOT resolve. Persisted in JOURNAL.md
Status .
- Package name is asked, not inferred (G-PKG-NAME). Before any
/ manifest creation (including /
/ ), fire an for the
import name. Folder name in snake_case is the
default. Manifest creation before G-PKG-NAME passes is
forbidden — running first creates a
entry, and reading "name is in the manifest" back is circular.
If a manifest exists, confirm via —
continuity from a prior session is not continuity from a user
decision.
- Skore Project mode is asked, not assumed (G-SKORE-MODE).
Before any template instantiation containing
, fire an for |
. Default proposal: . Hub triggers a follow-up for
the workspace name (org/team on the hub — distinct from
local-mode ). Persists as (+
when hub). Without it the
substitution has no shape to fill.
Details: →
references/g_skore_mode.md
.
- Switching skore mode mid-project is forbidden by default.
Once recorded, do not silently change. A switch orphans every
existing report in the prior store — skore has no built-in
migration. Requires explicit confirmation
surfacing the migration burden, then rewrite all
blocks. Procedure:
→
references/g_skore_mode.md
§ "Switching mid-project".
- Env manager is asked, not assumed (G-ENV-MGR). Hand off to
. Pixi on PATH is detection, not permission.
Don't run / / until
G-ENV-MGR has passed in .
- Harness "no clarifying questions" hints do NOT waive these
gates. G-TABULAR, G-PKG-NAME, G-ENV-MGR, G-SKORE-MODE,
python-api consultation, new-vs-edit decision are
operating-contract gates. "Quick" / "go fast" never waives them.
- Post-hoc audit — required before ending the turn. Walk every
pre-flight row; if any Evidence cell is unfilled, surface the
non-compliance explicitly. Most common failure: "I scaffolded
successfully so everything must be fine".
Forbidden shortcuts
| Shortcut | Why it's wrong |
|---|
| on PATH → run to get a manifest, then read the name back | Violates G-ENV-MGR (silent manager pick) AND G-PKG-NAME (name from folder via init side-effect). Circular: the agent created the manifest it now claims to read |
| Folder name = good name → skip the ask | Default value is fine; silent pick is not. G-PKG-NAME requires the structured ask even with folder as default |
| already importable via skore → write in | Transitive presence is not a pick. Violates G-TABULAR |
Scaffold every skeleton in one turn, incl. experiments/01_baseline.py
body | Scaffold stops at empty placeholder. Experiment script content lands after design-note approval ( § 3) |
| Scaffold drops at workspace creation | Audit files placed by at § 4 record-outcome. Empty at scaffold is correct |
| Forget in the scaffold layout | Four-way stem pairing breaks |
| exists with → reuse without confirming | Always re-confirm via G-PKG-NAME |
| Batch G-TABULAR + G-PKG-NAME + G-ENV-MGR + G-SKORE-MODE into prose recommendations | The gates take structured . Prose followed by "let me know" does NOT resolve them |
| Skip G-SKORE-MODE because templates use | Templates carry the marker, not a literal. The gate must fire |
| Pick without checking the workspace exists / user has access | Project init fails at first with an authorization error. Confirm during G-SKORE-MODE, not at execution time |
| Substitute based on agent guess | Install variant comes from G-SKORE-MODE's recorded answer. reads that row, not agent intuition |
| Silently change mid-project to "fix" a broken init | Switching orphans existing reports. Always explicit first |
| Hub substitution but leaving kwarg | is local-only; hub raises . Substitute the whole block, not just the mode literal |
Local (relative) instead of str(PROJECT_ROOT / "reports")
(absolute) | Relative resolves against CWD; runs from other dirs write the store somewhere unexpected. Always absolute via |
| Putting after | requires authenticated session in hub mode. first |
| Substituting in independently of | Audit must open the same Project. Byte-identical copy from the experiment file is the rule |
| Hub workspace name contains (e.g. ) | The is reserved as separator. Reject at G-SKORE-MODE follow-up |
| raised → re-run + to "recover" | Lookup shape wrong ( is by id). Use → |
Pre-flight — emit before any code
Each ticked box needs an Evidence line (format spec in
§ "Pre-flight evidence requirements"; see
also
python-env-manager/references/preflight_evidence.md
).
Pre-flight (organize-ml-workspace):
- [ ] `Workspace decisions` in `journal/JOURNAL.md` Status checked
for pre-recorded gates (tabular, env_manager, package, skore mode)
Evidence: lists each <gate>: <value | not recorded>
| "n/a — JOURNAL.md does not exist yet (truly fresh)"
- [ ] Tier 1 mandatory libs importable: sklearn, skrub, skore
Evidence: Write scratch/<ts>_check_tier1.py + `pixi run python …` output.
**Inline `python -c` is NOT evidence**.
- [ ] Layout detection done: <existing | fresh>
Evidence: ls/Glob on project root + matched signal from Detection
- [ ] G-TABULAR resolved: pandas | polars
Evidence: AskUserQuestion id=<id> via data-science-python-stack |
JOURNAL.md Status (Workspace decisions) | user quote turn N
- [ ] G-ENV-MGR resolved
Evidence: AskUserQuestion id=<id> via python-env-manager |
JOURNAL.md Status (Workspace decisions)
- [ ] G-PKG-NAME resolved: <name>
Evidence: AskUserQuestion id=<id>, answer=<name> |
JOURNAL.md Status (Workspace decisions) |
existing manifest's [project].name **confirmed via AskUserQuestion**
(reading the manifest alone is NOT sufficient)
- [ ] G-SKORE-MODE resolved: local | hub
Evidence: AskUserQuestion id=<id>, answer=<local|hub> |
JOURNAL.md Status (Workspace decisions) `skore mode:` row
If hub: also captures `skore hub workspace:` row.
- [ ] `pyproject.toml` present at root declaring `src/<pkg>/`;
editable install wired via `python-env-manager` § Editable workspace
Evidence: Read pyproject.toml (this turn) + manager's editable-install call
- [ ] python-api consulted for: Project, put, evaluate
Evidence: Read scratch/api/skore/<v>/{project_local,evaluate}.md
| Write of the same files (this turn)
| "n/a — symbols already in workspace cache"
- [ ] Decision: new experiment file vs edit existing
Evidence: AskUserQuestion id=<id> | user quote turn N |
"n/a — first experiment in a fresh workspace"
- [ ] `journal/` scaffolded with empty placeholder JOURNAL.md
Evidence: Write journal/JOURNAL.md (this turn) | "already exists"
- [ ] Pre-flight re-emitted with evidence before final message.
Evidence: this checklist appears in the end-of-turn summary.
Detection — existing workspace first
| Signal | Meaning |
|---|
with + [tool.setuptools.packages.find]
(or poetry / hatch equivalents) | Package declared installable — use this name; verify editable install via |
| / / with name but no table | Manager knows the project but package isn't installable — flag, offer to add |
| or at root | Package dir already chosen — keep it |
| at root or under | Stale out-of-band — flag drift, offer to wire via manager |
| , , , | Experiment location chosen — keep it |
| with files | Audit location chosen — keep it; body owned by |
| , , | Journal location chosen — keep it |
| , , | Report location chosen — keep it |
| Test location chosen — keep it |
| / at root | Prior tracker artifacts — leave alone; skore is canonical |
| files in experiment folder | User is on notebooks — surface the shift and ask; don't auto-switch |
Any signal present → glue to existing convention. No renames,
no relocates. None present → fresh scaffold (below).
→ next: G-PKG-NAME, then
for G-ENV-MGR.
Default layout (fresh workspace)
project/
├── pyproject.toml # declares src/<pkg>/ as installable
├── <manager manifest> # pixi.toml / poetry / uv / hatch / environment.yml
├── src/<pkg>/
│ ├── __init__.py # exposes PROJECT_ROOT
│ ├── data.py # data loading, splits, split_kwargs
│ ├── features.py # transformers, encoders, feature fns
│ ├── pipeline.py # the learner declaration (skrub DataOps)
│ └── evaluate.py # ONLY: CV strategy + optional metric overrides
├── journal/
│ ├── JOURNAL.md # session-start log; index of experiments
│ └── 01_baseline.md # one `.md` per planned experiment
├── experiments/
│ └── 01_baseline.py # one `# %%` script per experiment
├── audit/
│ └── 01_baseline.py # body owned by audit-ml-pipeline (read-only)
├── tests/
│ └── smoke/ # body owned by smoke-test-ml-pipeline
├── overview/
│ └── summary.md # agent-authored narrative (iterate-ml-experiment § 4)
├── scratch/ # agent-only (gitignored entirely)
└── reports/ # skore Project lives here
The package is installable. declares
; the manager installs in
editable mode so
from <pkg>.pipeline import build_learner
works from any CWD.
Wiring per-manager:
§ Editable workspace.
Runtime deps (sklearn, skrub, skore, tabular) live in the
manager's manifest, not in
.
Deliberately absent: no
(user-owned), no
(out of scope). Add later only on user request — don't pre-empt.
File-creation rules
Design note first, then code
Before creating
experiments/NN_<short_name>.py
, the matching
journal/NN_<short_name>.md
must exist and have been validated by
the user. Design-note content is owned by
;
this skill only enforces the pairing.
Four-way stem pairing
Every experiment is identified by
in four places:
journal/NN_<short_name>.md (design note)
experiments/NN_<short_name>.py (script)
tests/smoke/test_NN_<short_name>.py (smoke test)
audit/NN_<short_name>.py (audit file — read-only)
By the time the experiment flips to
in JOURNAL.md AND its
summary is refreshed in
, all four exist.
The design note is written first; the script lands on approval;
the smoke test body is filled by
; the
audit file is placed and executed by
at § 4
record-outcome.
The audit file is
read-only against the workspace's skore
Project and data — see
§ Read-only contract.
New experiment → new file. Iterating → ask first.
Default: new file.
,
. The
numeric prefix preserves iteration order under
.
When the user says "let's tweak experiment 02",
do not assume.
Fire
:
Should this be a new experiment file (e.g.
) or an in-place edit of
?
In-place edits
overwrite the prior result in the skore Project
if the same key is reused — flag this. In-place also requires
revisiting the matching smoke test
(→
).
Decision flow (13 steps — full version in references/scaffold_steps.md
)
| # | Step | Owner |
|---|
| 1 | Read project root; Detection table matches → glue (stop). No match → continue | this skill |
| 2 | G-PKG-NAME structured ask. Record in . No manager until this passes | this skill |
| 2a | G-SKORE-MODE ask: local | hub (+ hub workspace name if hub). Determines form + skore install variant. → references/g_skore_mode.md
|
| 3 | Drop from (substitute ). Hand off to for editable install | this skill → env-manager |
| 4 | Create with skeletons from | this skill |
| 5 | Create experiments/01_baseline.py
from (substitute , per G-SKORE-MODE, ) | this skill |
| 6 | Create empty . Verify pytest on manifest | this skill |
| 6a | Create empty | this skill |
| 7 | Create one-line placeholder; rewrites it | this skill |
| 8 | Create from | this skill |
| 9 | Create empty (no README — owned by ) | this skill |
| 10 | Create empty | this skill |
| 11 | Touch — drop template if none; else suggest patch (always ask about ) | this skill |
| 12 | Hand off to § Initial setup for + first pass — invoking the skill teaches NumPyDoc | this skill → python-code-style |
| 13 | Hand back to the relevant sibling ( for design note, etc.) | this skill → next caller |
→ next:
§ 0 (bootstrap) for the first
design note.
Files in src/<pkg>/
Each has a narrow contract:
- — exposes (absolute, derived
from , not CWD). Modules needing project-relative
paths import this constant. Requires editable install.
- — loaders, materialization of , , any
(groups, time, …) at the X marker. Pipeline
mechanics in .
- — feature functions and transformers.
- — the learner declaration (a ).
exposes so the
experiment script can pass an absolute path from .
- — only the inputs to :
the cross-validator (), optional metric
overrides. Does NOT call , does NOT open a
Project, does NOT persist.
Experiment scripts —
cell markers, not
. Template:
. What the script does:
- Open / attach to the at (or hub).
- Import the learner from and CV from
.
- Call .
- Call
project.put("<experiment-key>", report)
.
Confirm signatures via
. Cross-validator choice is
.
Project init substitution — the
marker
in
is replaced at scaffold time per the
recorded
decision. Two forms (local vs hub),
side-by-side anatomy, audit-file copy rule:
→
references/g_skore_mode.md
.
Experiment scripts stay clean of agent-only .
Inspection lives in
. One exception: a bare
expression — that's a notebook-display side effect.
Experiment key convention — the file's stem (e.g.
→
). One file → one key → one
report.
Companion skills
| Skill | Relationship |
|---|
| Owns and per-experiment design notes. This skill places empty ; that skill fills it |
| Body of , , |
| Body of ; CV strategy |
| Layout of + stem-pairing rule |
| Body of the smoke test once design note is approved |
| Body of . Read-only against the workspace |
| skore / skrub / sklearn signatures |
| Detection + install commands + bootstrap |
data-science-python-stack
| What to install (Tier 1/2/3) |
| drop + NumPyDoc convention (step 12) |
Templates
- — copied per new experiment
- — placeholder at scaffold; rewritten by
§ 4
- — declares as installable
templates/src___init__.py
— package init with
- / / /
— one-time skeletons
- — dropped at scaffold if none exists
Copy, don't rewrite. Section names encode contracts.
References (load on demand)
references/scaffold_steps.md
— full prose elaboration of the
13-step Decision flow with examples and rationale.
references/g_skore_mode.md
— the G-SKORE-MODE gate in detail:
project init forms side-by-side, anatomy of the
substitution, switching mid-project,
out-of-scope notes (MLflow mode, Skore Hub account creation).