tinkering

Original：🇺🇸 English

Translated

Safe experimentation framework for AI agents. Creates isolated sandbox environments for trying new features, testing approaches, and exploring solutions without polluting the main codebase. USE WHEN: Agent needs to try something uncertain, explore multiple approaches, test a new library, prototype a feature, or run a technical spike before committing to implementation. PRIMARY TRIGGERS: "experiment with" = Setup sandbox + run experiment "try this approach" = Quick experiment in sandbox "spike" / "POC" / "prototype" = Time-boxed technical investigation "tinker" / "tinkering mode" = Enter experimentation workflow "explore options" = Multi-approach comparison in sandbox NOT FOR: Debugging (use debugger), testing (use test runner), or committed feature work (use git branches). DIFFERENTIATOR: Unlike git branches (for committed direction), tinkering is for "I don't know if this will work" exploration. Try 5 things in sandbox before committing to a branch. Faster feedback, zero codebase pollution.

2installs

Sourcerfxlamia/claude-skillkit

Added on2026-02-19

NPX Install

npx skill4agent add rfxlamia/claude-skillkit tinkering

SKILL.md Content

View Translation Comparison →

Tinkering

Overview

Structured experimentation framework. When uncertain about an approach, don't hack at production code - create an isolated sandbox, try freely, then graduate successful experiments or discard failed ones cleanly.

Core principle: The output of tinkering is knowledge, not production code. A successful experiment teaches you how to solve the problem. The actual implementation happens after, informed by what you learned.

When to Use

Situation	Tinkering?	Why
"Will this library work for our use case?"	Yes	Unknown outcome, need to explore
"Which of these 3 approaches is fastest?"	Yes	Comparing multiple options
"How do I integrate this API?"	Yes	Technical spike, learning-focused
"Add a login button to the header"	No	Clear requirement, use git branch
"Fix the null pointer on line 42"	No	Debugging, not experimenting
"Refactor auth module to use JWT"	Maybe	If approach uncertain, spike first

Workflow

Phase 1: Setup Sandbox

Create isolated experiment environment:

bash

# 1. Create experiment directory
mkdir -p _experiments/{experiment-name}

# 2. Add to .gitignore (if not already present)
grep -qxF '_experiments/' .gitignore 2>/dev/null || echo '_experiments/' >> .gitignore

# 3. Create manifest (first time only)
# See MANIFEST.md template below

MANIFEST.md template (create at

_experiments/MANIFEST.md

):

markdown

# Experiment Log

## Active

### {experiment-name}
- **Date**: YYYY-MM-DD
- **Hypothesis**: What we're trying to learn
- **Status**: active
- **Result**: (pending)

## Completed
<!-- Move finished experiments here -->

Rules:

NEVER modify production files during tinkering
ALL experiment code goes inside
```
_experiments/{name}/
```
Copy source files into sandbox if you need to modify them

Phase 2: Hypothesize

Before writing any code, state clearly:

Question : What specific question are we answering?
Success  : How will we know it works?
Time box : Maximum time to spend (default: 30 min)
Scope    : Which files/areas are involved?

Write this in

_experiments/{name}/HYPOTHESIS.md

or as a top comment.

Example:

Question : Can we replace moment.js with date-fns and reduce bundle size?
Success  : Bundle decreases >20%, all date formatting still works
Time box : 20 minutes
Scope    : src/utils/date.ts, package.json

Phase 3: Experiment

Build freely in the sandbox.

Modifying existing code:

bash

# Copy the file(s) you need to change
cp src/utils/date.ts _experiments/date-fns-migration/date.ts
# Edit the copy freely - zero risk to production

New feature exploration:

bash

# Create new files directly in sandbox
touch _experiments/websocket-poc/server.ts
touch _experiments/websocket-poc/client.ts

Library evaluation:

bash

# Minimal test script in sandbox
touch _experiments/redis-eval/test_redis.py
# Use isolated dependencies (venv, local node_modules)

Multi-approach comparison:

_experiments/caching-spike/
  approach-a-redis/
  approach-b-memory/
  approach-c-sqlite/
  COMPARISON.md       # Side-by-side evaluation

Rules during experimentation:

Stay in sandbox - never touch production files
Quick and dirty is fine - this is throwaway code
Document learnings as you go
Stop at time box, even if incomplete - partial answers are still answers

Phase 4: Evaluate

Assess results against the hypothesis.

Checklist:

Did the experiment answer the original question?
Does it meet the success criteria from Phase 2?
Any unexpected side effects or constraints discovered?
Is the approach feasible for production implementation?
What's the estimated effort to implement properly?

Update MANIFEST.md:

markdown

- **Result**: SUCCESS - date-fns reduced bundle by 34%, all tests pass
- **Status**: graduated
- **Notes**: Need to handle timezone edge case in formatRelative()

Decision:

Positive result -> Phase 5, Path A (Graduate)
Negative result -> Phase 5, Path B (Discard)
Inconclusive -> Extend time box OR try different approach

Phase 5: Graduate or Discard

Path A: Graduate (success)

Load reference:

references/graduation-checklist.md

Quick summary:

Do NOT copy-paste experiment code directly into production
Re-implement properly using what you learned
Write proper tests for the production implementation
Apply code standards (experiment was quick & dirty, production shouldn't be)
Reference experiment in commit message for context

Path B: Discard (failed)

Failed experiments are valuable - they tell you what NOT to do.

Update MANIFEST.md with failure reason and learnings
Delete experiment files:
```
rm -rf _experiments/{name}/
```
Or keep briefly if learnings are worth referencing

Phase 6: Cleanup

bash

# Remove completed experiment
rm -rf _experiments/{experiment-name}/

# Update MANIFEST.md - move entry to "Completed" section

MANIFEST.md after cleanup:

markdown

## Completed

### date-fns-migration (2025-01-15)
- GRADUATED - Implemented in commit abc123
- Learnings: date-fns 3x smaller, timezone handling needs explicit config

### graphql-evaluation (2025-01-10)
- DISCARDED - Too much overhead for our simple REST API
- Learnings: REST + OpenAPI better fit for <20 endpoints

Quick Reference

Setup      ->  mkdir _experiments/{name}, add to .gitignore
Hypothesize ->  Question + success criteria + time box
Experiment  ->  Build in sandbox (never touch production)
Evaluate    ->  Check against success criteria
Graduate    ->  Re-implement properly in production
Cleanup     ->  Remove files, update manifest

Edge Cases

Needs database changes: Use separate test DB or schema prefix. Document in hypothesis.

Needs running server: Run from sandbox, use different port to avoid conflicts.

Multiple concurrent experiments: Each gets own subdirectory. MANIFEST tracks all.

Experiment grows into real feature: Graduate it. Don't let experiments become shadow production code.

Team member needs to see experiment: Push to feature branch (temporarily track

_experiments/

) or share via patch.

tinkering

NPX Install

Tags

SKILL.md Content

Tinkering

Overview

When to Use

Workflow

Phase 1: Setup Sandbox

Phase 2: Hypothesize

Phase 3: Experiment

Phase 4: Evaluate

Phase 5: Graduate or Discard

Path A: Graduate (success)

Path B: Discard (failed)

Phase 6: Cleanup

Quick Reference

Edge Cases