Loading...
Loading...
Universal archivist for personal file archives (Dropbox/B2/Gmail-takeout/local-mount/hard-drive-dump). Filters for high-value content (the user's own writing, ideas, relationships) and surfaces it interactively. REFUSES TO RUN without an explicit gbrain.yml `archive-crawler.scan_paths:` allow-list.
npx skill4agent add garrytan/gbrain archive-crawlerConvention: see conventions/quality.md for citation rules, exact-phrasing requirements when capturing the user's reactions, and back-link enforcement.Convention: see _brain-filing-rules.md — this skill is schema-generic: it reads the user's filing rules from the rules JSON instead of hardcoding any specific era / archive layout.
archive-crawler.scan_paths:gbrain.yml# gbrain.yml — the allow-list is mandatory
archive-crawler:
scan_paths:
- ~/Documents/writing/
- ~/Dropbox/Archive/
- /mnt/backup/old-letters/
# Optional deny-list inside the allow-list:
# deny_paths:
# - ~/Documents/finances/
# - ~/Documents/medical/scan_pathsarchive-crawler: refusing to run. No `archive-crawler.scan_paths:` allow-list
in gbrain.yml. Add explicit paths the agent is permitted to scan, then re-run.
This is a safety fence — the agent will not infer what's safe to read.src/core/storage-config.tsdb_trackeddb_only.mboxlocaldropboxbackblazegmail-takeoutmboxpstprojects/<archive-slug>/STATUS.md⬜ unseen👀 reviewed✅ ingested⏭️ skip🔥 high-signal| Keep (show) | Skip (note existence, don't show) |
|---|---|
| Personal writing (journals, letters, reflections, essays) | System files, configs, package.json, node_modules |
| Conversations (IM logs, email threads with substance) | Binary blobs (images / video) |
| Ideas, theses, frameworks | Receipts, invoices, tax docs |
| Relationship material (letters to / from people who matter) | Spam, newsletters, mailing-list bulk |
| Creative work (poetry, stories, code with soul) | Corrupted / null files |
| Origin stories (first versions of things that became important) | |
| Emotional content (anger, love, grief, discovery) |
projects/<archive-slug>/STATUS.md_brain-filing-rules.mdoriginals/<slug>.mdpersonal/<slug>.mdideas/<slug>.mdpeople/<person>/timelinepersonal/<slug>.mdoriginals/<slug>.mdoriginals/archive/originals/yc-era/_brain-filing-rules.json---
title: "[Title or first line]"
type: original
source_type: "[local|dropbox|backblaze|gmail-takeout|mbox|pst]"
source_path: "[path within the allow-listed scan_paths]"
date: "YYYY-MM-DD" # date from the file metadata or content
people: ["person-1", "person-2"]
tags: ["tag-1", "tag-2"]
---
# [Title]
[Summary: what it is, when it's from, why it matters]
**User's reaction:** [exact quote, no paraphrasing]
## Context
[Cross-links to people, concepts, projects.]
---
[Raw source material below the line — full text].mboximport mailbox
mbox = mailbox.mbox('/path/to/file.mbox')
for msg in mbox:
body = ''
if msg.is_multipart():
for part in msg.walk():
if part.get_content_type() == 'text/plain':
body = part.get_payload(decode=True).decode('utf-8', errors='replace')
break
else:
body = msg.get_payload(decode=True).decode('utf-8', errors='replace')
# Apply gold filter.doc.docx# .docx (modern)
python3 -c "
import zipfile, xml.etree.ElementTree as ET
with zipfile.ZipFile('/path/to/file.docx') as z:
tree = ET.parse(z.open('word/document.xml'))
print(''.join(t.text or '' for t in tree.iter('{http://schemas.openxmlformats.org/wordprocessingml/2006/main}t')))
"
# .doc (legacy, requires antiword or catdoc)
antiword /path/to/file.doc 2>/dev/null || catdoc /path/to/file.doc 2>/dev/null.pst# Validate first; many PSTs are null bytes
python3 -c "
with open('/path/to/file.pst', 'rb') as f:
print('Valid PST' if f.read(4) == b'!BDN' else 'CORRUPT/NULL')
"
# If valid:
readpst -o /tmp/pst-output /path/to/file.pst.zip.tar.tar.gz---
title: "[Archive Name] — Ingestion Status"
type: project
created: YYYY-MM-DD
updated: YYYY-MM-DD
source_type: "[local|dropbox|...]"
scan_paths: ["paths from gbrain.yml"]
---
# [Archive Name] — Ingestion Status
## Source
- **Type:** [local|dropbox|...]
- **Allow-listed paths:** [from gbrain.yml]
- **Total files:** [N]
- **Total size:** [X GB]
- **Date range:** [earliest] — [latest]
## Inventory
### [Folder 1]
| Item | Type | Size | Status | Reaction |
|------|------|------|--------|----------|
| file1.txt | text | 2KB | ✅ ingested | 🔥 "exact quote" |
| file2.doc | doc | 15KB | ⏭️ skip | — |
| file3.html | html | 4KB | ⬜ unseen | — |
### [Folder 2]
...
## Priority Queue
1. [Highest priority — why]
2. [Next — why]
...
## Session Log
### YYYY-MM-DD — [Session topic]
- Reviewed: [list]
- Reactions: [exact quotes]
- Ingested: [brain pages created]
- Next: [what's queued]archive-crawler.scan_paths:originals/archive/originals/yc-era/skills/voice-note-ingest/SKILL.mdskills/idea-ingest/SKILL.mdskills/conventions/quality.mdwrites_to:quality.mdbrain-first.md_brain-filing-rules.mdtest/skills-conformance.test.ts