Deep Research (Multi-agent Orchestration Workflow for Deep Research)

Treat "deep research" as a reusable, parallelizable production process: The main controller is responsible for clarifying goals, splitting sub-goals, scheduling sub-processes, aggregating and refining; sub-processes are responsible for collecting/extracting/local analyzing and outputting structured Markdown materials; the final deliverable must be an independent finished file rather than a chat post.

Key Constraints (Must Follow)

Keep default model and configuration unchanged: Do not explicitly override the model or use additional parameters to overwrite default model/inference settings; adjust relevant configurations only when explicitly authorized by the user.
Default minimum permissions: Sub-processes control available tools via
```
--allowedTools
```
; enable permissions such as network access only when necessary.
Prioritize skills for networking, then MCP: Prioritize using installed skills; if MCP must be used, prioritize
```
firecrawl
```
, followed by
```
exa
```
; consider WebFetch/WebSearch only when it is truly impossible to meet requirements.
Non-interaction friendly: Sub-processes do not use plan tools, and do not interact with users in a "wait for confirmation/wait for feedback" manner; focus on file delivery and traceable logs.
Prioritize file delivery: The final deliverable must be saved as an independent file; it is prohibited to post the complete draft in chat.
Output decision and progress logs at each step: Especially before splitting, scheduling, aggregating, refining, and delivering.
Task scale judgment threshold: Must start
```
claude -p
```
sub-processes when the number of sub-goals ≥3; when <3 sub-goals, the main process can execute directly, but still needs to record the complete directory structure and raw data.
Must wait for user confirmation: After completing the preliminary investigation, clearly ask the user "Do you want to start execution?" and do not proceed to the next step until the user replies with affirmative words such as "execute/start/go/yes".

Task Objectives

Derive a set of parallel sub-goals from the user's high-level objectives (such as link lists, dataset shards, module lists, time slices, etc.).
Start independent
```
claude -p
```
sub-processes for each sub-goal and assign appropriate permissions (via the
```
--allowedTools
```
parameter).
Execute in parallel and produce sub-reports (natural language Markdown, which can include sections/tables/lists); output error descriptions in Markdown format with follow-up suggestions in case of failure.
Aggregate sub-outputs in order using scripts to generate a unified draft.
Conduct sanity checks and minimal fixes on the draft, then provide the final artefact path and summary of key findings.

Delivery Standards

The deliverable must be a structured, insight-driven complete product; it is prohibited to directly splice sub-task Markdown as the final draft.
If it is necessary to retain the original sub-task content, save it as an internal file (e.g.,
```
.research/<name>/aggregated_raw.md
```
), and only absorb key insights/evidence in the finished product.
Refinement and revision must be iterated chapter by chapter, and the entire text cannot be deleted and rewritten at once; check references, data and context after each modification to ensure traceability.
Deliver detailed, in-depth analytical reports by default.
Conduct "double inspection" before delivery:
1. Check whether it is truly produced through "chapter-based, multi-round integration"; if it is generated in one go, return it to be rewritten by chapters.
2. Evaluate whether it is detailed enough; if it is thin, first judge whether it is "insufficient sub-task materials" or "over-compressed during finalization": the former drives supplementary/ additional research, while the latter continues to expand and refine based on existing materials until it meets the detailed standards.

Task Scale Classification and Execution Path

Select the execution path based on the number of sub-goals:

Scale	Number of Sub-goals	Execution Method	Directory Requirements
Micro	1-2	Executed directly by main process	Still requires `raw/` , `logs/` , `final_report.md`
Small	3-5	Start sub-processes, serial or limited parallel	Complete directory structure
Medium	6-15	Parallel sub-processes (default 8 concurrency)	Complete directory structure + scheduling script
Large	>15	GNU Parallel + batch scheduling	Complete directory structure + multi-stage scheduling

Note: Even for micro tasks, you must:

Save raw search results to the
```
raw/
```
directory
Record execution logs to
```
logs/dispatcher.log
```
Wait for user confirmation before execution (unless the user explicitly says "execute directly")

End-to-end Process (Strictly Follow the Order)

Pre-execution Planning and Preliminary Investigation (Mandatory; Completed by Main Controller)
- First clarify the goals, risks, resource/permission constraints, and identify the core dimensions of subsequent diffusion dependencies (topic clusters, people/organizations, regions, time slices, etc.).
- If there are public directories/indexes (tab pages, API lists, etc.), crawl and cache them in a minimal way and count the entries; if not, conduct "desk research" to obtain real samples (news, materials, datasets, etc.), record the source/time/key points as evidence.
- Show at least one representative sample of real retrieval or browsing before forming the list; relying only on experience speculation does not count as completing the preliminary investigation.
- During the preliminary investigation, you must obtain real samples through a "traceable toolchain" at least once and record the references: prioritize using installed skills; if MCP is needed, prioritize
```
firecrawl
```
  , followed by
```
exa
```
  ; if neither is available, record the reason and choose an alternative solution (downgrade to WebFetch/WebSearch only when necessary).
- Output a preliminary (or draft) list: list the discovered dimensions, options and samples mastered in each dimension, scale estimation, and mark uncertainties/gaps. If real samples have not been obtained yet, complete the research first and prohibit proceeding to the next step.
- Complement the executable plan (splitting, scripts/tools, output format, permissions, timeout strategy, etc.) based on the above structure, report the dimension statistics and plan content in the user's language; wait until a clear "execute/start" response is received.
Initialization and Overall Planning
- Clarify the goals, expected output format and evaluation criteria.
- Generate a semantic and unique name
```
name
```
  for the current task (recommended:
```
<YYYYMMDD>-<short-title>-<random-suffix>
```
  , all lowercase, separated by hyphens, no spaces).
- Create the running directory
```
.research/<name>/
```
  , and save all products to this directory (subdirectories such as
```
prompts/
```
  ,
```
logs/
```
  ,
```
child_outputs/
```
  ,
```
raw/
```
  ,
```
cache/
```
  ,
```
tmp/
```
  ).
- Keep the default model and configuration unchanged; obtain user consent first when adjusting any model/inference/permission-related settings, and note the change reason and scope of impact in the log.
Sub-goal Identification
- Extract or construct a list of sub-goals through scripts/commands.
- When source data is insufficient (e.g., the page only provides two main links), record the reason truthfully, and then the main process directly takes over to complete the remaining work.
Generate Scheduling Script
- Create a scheduling script (e.g.,
```
.research/<name>/run_children.sh
```
  ), which requires:
  - Receive the list of sub-goals (can be stored in JSON/CSV) and schedule them one by one.
  - Construct
    claude -p
    calls for each sub-goal, recommended key points:
    - Recommended form:
      claude -p "prompt" --allowedTools "Read,Write,Edit,Bash,WebFetch,WebSearch,mcp__firecrawl__*"
      (refer to
      claude --help
      ).
    - State in the prompt: Prioritize installed skills for all networking needs (skill priority); if MCP must be used, prioritize
      firecrawl
      , followed by
      exa
      ; use WebFetch/WebSearch only when it is truly impossible; do not use plan tools and "wait for manual interaction".
    - Do not pass model parameters unless required by the user.
    - Specify the output path for sub-results (e.g.,
      .research/<name>/child_outputs/<id>.md
      ).
    - Can reference the following call template (only demonstrates parameters, does not involve parallelism):
      bash
      timeout 600 claude -p "$(cat "$prompt_file")" \ --allowedTools "Read,Write,Edit,Bash,Glob,Grep,WebFetch,WebSearch,mcp__firecrawl__firecrawl_scrape,mcp__firecrawl__firecrawl_search" \ --output-format json \ > "$output_file" 2>&1
    - If sub-processes need to execute more tools, append the corresponding tool names in
      --allowedTools
      .
    - Set timeout based on task scale: assign 5 minutes (
      timeout 300
      ) for small tasks, and relax to a maximum of 15 minutes (
      timeout 900
      ) for larger tasks, with external
      timeout
      command as fallback. When the 5-minute timeout is hit for the first time, judge whether to split/modify parameters and retry based on the actual task; if it still cannot be completed within 15 minutes, record that the prompt/process needs to be troubleshooted.
    - For small-scale tasks (<8), use loops + background tasks (or queue control) to achieve parallelism to avoid failures caused by command line length limits; for large-scale tasks, use
      xargs
      /GNU Parallel, but first verify parameter expansion with a small scale. Default 8 concurrent tasks, which can be adjusted according to hardware or quotas.
    - Do not replace parallelism with "running one by one serially"; do not bypass the established process by means such as "the main process searches casually".
    - Capture the exit code of each sub-process and write it to the log in the running directory; use methods like
      stdbuf -oL -eL claude -p … 2>&1 | tee .research/<name>/logs/<id>.log
      to ensure real-time refresh, which is convenient for
      tail -f
      to observe progress.
- When data volume is sufficient, the main controller should try not to undertake heavy work such as downloading/parsing personally; assign these tasks to sub-processes, and the main controller focuses on prompt, template and environment preparation.
Design Sub-process Prompt
- Dynamically generate a prompt template, which should at least include:
  - Sub-goal description, input data, constraint boundaries.
  - Limit the total number of rounds of network retrieval/extraction during the planning stage to no more than X (selected according to complexity; usually recommended 10), and converge when information is sufficient; tool priority: skills → MCP (
    firecrawl
    →
    exa
    ) → WebFetch/WebSearch.
  - Output results in natural language Markdown: including conclusions, list of key evidence, reference links; provide error descriptions in Markdown format and follow-up suggestions when errors occur.
  - When generating actual prompt files, prioritize using
    printf
    /line-by-line writing to inject variables to avoid the known issue of Bash 3.2 truncating variables in
    cat <<EOF
    scenarios with multi-byte characters.
- Write the template to a file (e.g.,
```
.research/<name>/child_prompt_template.md
```
  ) for auditing and reuse.
- Before starting the scheduling script, quickly review the generated prompt files one by one (e.g.,
```
cat .research/<name>/prompts/<id>.md
```
  ), confirm that variable substitution is correct and instructions are complete before dispatching tasks.
Parallel Execution and Monitoring
- Run the scheduling script.
- Record the start/end time, duration and status of each sub-process.
- Make clear decisions on failed/timeout sub-processes: mark, retry, or explain in the final report; record that the prompt/process needs to be troubleshooted when the 15-minute timeout limit is reached. During the execution of long tasks, you can prompt the user to track real-time output with
```
tail -f .research/<name>/logs/<id>.log
```
  .
Programmatic Aggregation (Generate Draft)
- Use a script (e.g.,
```
.research/<name>/aggregate.py
```
  ) to read all Markdown files under
```
.research/<name>/child_outputs/
```
  , and aggregate them into an initial main document (e.g.,
```
.research/<name>/final_report.md
```
  ) in the preset order.
Interpret Aggregation Results and Design Structure
- Read
```
.research/<name>/final_report.md
```
  and key sub-outputs thoroughly.
- Design the chapter outline and "material mapping" for the refined report (e.g.,
```
.research/<name>/polish_outline.md
```
  ), clarify the target audience, chapter order and core arguments of each chapter.
Chapter-by-chapter Refinement and Finalization
- Create a refined draft (e.g.,
```
.research/<name>/polished_report.md
```
  ), write chapter by chapter according to the outline; self-check facts, references and language requirements immediately after completing each chapter, and backtrack to sub-drafts for verification if necessary.
- Avoid rewriting the entire text at once; adhere to "chapter-based iteration" to maintain consistency and reduce the risk of omissions, while recording the highlights, problems and handling methods of each chapter.
- Uniformly organize duplicate information, reference formats, and items to be confirmed, while retaining core facts and quantitative data.
Delivery
- Confirm that the refined draft meets the delivery standards (complete structure, unified tone, accurate references), and use this finished product as the external report.
- The final deliverable must be saved as an independent file (located in
```
.research/<name>/
```
  ); report to the user by providing the file path and necessary summary, and it is prohibited to post the complete draft in chat.
- Outline core conclusions and actionable recommendations in the final reply; supplement follow-up methods for items to be confirmed if necessary.
- Do not attach intermediate drafts or internal notes externally, ensuring that users see high-quality finished products.

Notes

Keep the process idempotent: generate a new
```
.research/<name>/
```
for each run to avoid overwriting old files.
All structured outputs must be valid UTF-8 text.
Elevate permissions only when authorized or truly necessary; avoid abusing permissions.
Be cautious when cleaning up temporary resources to ensure logs and outputs are traceable.
Provide downgradable explanations for failed processes: attempt at least twice for crawling tasks; if still failed, add a section "Failure Reasons/Follow-up Suggestions" in Markdown to avoid blanks during aggregation.
Cache first: Raw materials obtained through skills/MCP are first written to cache directories such as
```
.research/<name>/raw/
```
, and subsequent processing prioritizes reading local cache to reduce repeated requests.
Understand completely before summarizing: Process the complete original text before summarizing/refining; do not mechanically truncate to a fixed length (e.g., first 500 characters). You can write scripts for full-text parsing, key sentence extraction or key point generation, but do not rely on "hard truncation".
Temporary directory isolation: Intermediate products (script logs, parsing results, cache, debugging outputs, etc.) are placed in subdirectories such as
```
.research/<name>/tmp/
```
,
```
.research/<name>/raw/
```
,
```
.research/<name>/cache/
```
, and can be cleaned up as needed after the process ends.
Search service priority: Prioritize using installed skills for network operations; if MCP is needed, first check available MCP tools, and prioritize
```
firecrawl
```
, followed by
```
exa
```
; fallback to WebFetch/WebSearch when MCP is unavailable.
MCP parameter and output control: For tools that may return large results, avoid requesting fields such as "raw full text" to prevent response expansion; extract in segments if necessary, list the directory first and then go deep as needed.
Image retrieval: If MCP supports image search/description, enable it unless the user explicitly requires "text only", and present image clues together with text evidence.

Claude Code Non-interactive Mode Reference

Basic Usage

bash

# Basic non-interactive call
claude -p "Your prompt here"

# Specify allowed tools (no manual confirmation required)
claude -p "Your prompt" --allowedTools "Read,Write,Edit,Bash"

# JSON format output (easy for script parsing)
claude -p "Your prompt" --output-format json

# Streaming JSON output
claude -p "Your prompt" --output-format stream-json

# Continue the previous conversation
claude -p "Follow up question" --continue

# Continue a specific conversation
claude -p "Follow up" --resume <session_id>

Sub-process Scheduling Template

bash

#!/bin/bash
# Example of sub-process scheduling

prompt_file="$1"
output_file="$2"
log_file="$3"

# Read prompt and execute
timeout 600 claude -p "$(cat "$prompt_file")" \
    --allowedTools "Read,Write,Edit,Bash,Glob,Grep,WebFetch,WebSearch,mcp__firecrawl__firecrawl_scrape,mcp__firecrawl__firecrawl_search,mcp__firecrawl__firecrawl_map" \
    --output-format json \
    2>&1 | tee "$log_file" > "$output_file"

exit_code=${PIPESTATUS[0]}
echo "Exit code: $exit_code" >> "$log_file"

Parallel Execution Example

bash

#!/bin/bash
# Execute multiple sub-tasks in parallel

max_parallel=8
research_dir=".research/$name"

# Use GNU Parallel (recommended)
cat "$research_dir/tasks.txt" | parallel -j $max_parallel \
    "timeout 600 claude -p \"\$(cat $research_dir/prompts/{}.md)\" \
    --allowedTools 'Read,Write,Edit,Bash,WebFetch,WebSearch' \
    --output-format json > $research_dir/child_outputs/{}.json 2>&1"

# Or use background tasks
for task_id in $(cat "$research_dir/task_ids.txt"); do
    (
        timeout 600 claude -p "$(cat "$research_dir/prompts/$task_id.md")" \
            --allowedTools "Read,Write,Edit,Bash,WebFetch,WebSearch" \
            --output-format json \
            > "$research_dir/child_outputs/$task_id.json" 2>&1
    ) &

    # Control the number of parallel tasks
    while [ $(jobs -r | wc -l) -ge $max_parallel ]; do
        sleep 1
    done
done

wait  # Wait for all background tasks to complete

General Experience and Best Practices

Verify environment assumptions first: Before writing the scheduling script, use
```
realpath
```
/
```
test -d
```
to confirm that key paths (such as
```
venv
```
, resource directories) exist; if necessary, derive the warehouse root path with
```
dirname "$0"
```
and pass it in via parameters to avoid hardcoding.
Make extraction logic configurable: Do not assume that web pages share the same DOM; provide configurable selectors/boundary conditions/readability parsers for parsing scripts, and only modify the configuration when reusing across sites.
Run through small scale first before parallelism: Before full parallelism, run 1–2 sub-goals serially to verify agent configuration, skills/MCP toolchain and output path; increase concurrency only after confirming the link is stable to avoid "unable to see errors after launch".
Hierarchical logs for easy tracing: The scheduler writes to
```
.research/<name>/dispatcher.log
```
; each sub-task writes to
```
.research/<name>/logs/<id>.log
```
separately, and directly
```
tail
```
the corresponding log to locate MCP/call details when failed.
Failure isolation and retry: When parallel failure occurs, first record the failed ID and log, prioritize retrying a single failed task; maintain a
```
failed_ids
```
list and uniformly prompt follow-up suggestions during the final stage.
Avoid repeated crawling: Before retrying, check whether
```
.research/<name>/child_outputs/<id>.md
```
already exists legally; if it exists, skip it to reduce quota consumption and repeated access.
Final review and refinement: Before delivery, must review whether the aggregated and refined draft meets language requirements (e.g., use Chinese throughout if required), and check whether references and data points are consistent with the source files; do not lose key facts and quantitative information during refinement, so that the finished product has insights rather than just stacking facts.
Present references in place: Directly append Markdown links to sources after each key point (e.g.,
```
[Source](https://example.com)
```
), avoid concentrating links at the end of the paragraph for easy immediate verification.
Coverage verification script: Use a lightweight script to count missing entries, empty fields or tag quantities after batch generation to ensure problems are discovered and remedied before reporting.
Set boundary constraints for sub-processes: Clearly specify accessible ranges (only specified URLs/directories) and available tools in sub-prompts to reduce the risk of out-of-bounds and repeated crawling, making the process safe and controllable on any site.

Thinking and Writing Guidelines

Think before acting: Pursue in-depth, independent thinking, and insights that exceed expectations (but do not mention "surprise" in the answer); figure out why the user asks this question, what the underlying assumptions are, and whether there is a more essential way to ask it; at the same time, clarify the success criteria that your answer should meet, and then organize the content around the criteria.

Maintain collaboration: Your goal is not to mechanically execute instructions, nor to force a definite answer when information is insufficient; instead, advance together with the user to gradually approach better questions and more reliable conclusions.

Writing style requirements:

Do not overuse bullet points, limit them to the top level; use natural language paragraphs when possible.
Do not use quotation marks unless directly quoting.
Maintain a friendly, easy-to-understand, rational and restrained tone when writing.

When executing this skill, output clear decision and progress logs at each step.

Pre-delivery Self-check List

Before submitting the final report, you must check the following list:

Directory Structure Check

```
.research/<name>/
```
directory has been created
```
logs/dispatcher.log
```
contains complete execution records (not written after the fact)
```
raw/
```
directory contains raw search/crawling results
When the number of sub-goals ≥3:
```
prompts/
```
,
```
child_outputs/
```
directories exist and have content

Process Compliance Check

Real samples are shown during the preliminary investigation (not based on experience speculation)
Execution starts only after explicit user confirmation (unless the user says "execute directly")
```
claude -p
```
sub-processes are started when the number of sub-goals ≥3
Logs are recorded in real time, not written after the fact

Report Quality Check

The report is produced through "chapter-based, multi-round integration", not generated in one go
Each key conclusion has traceable reference sources
Reference links have actually been accessed (not speculated from search results)
The report has been saved as an independent file, and the complete draft has not been posted in chat

Quick Failure Check

If any of the following situations occur, clearly explain them in the report:

Some sub-tasks failed/timeout: record failed IDs and reasons
Data sources are restricted/unreachable: record alternative solutions that have been tried
Information is incomplete: mark items to be confirmed and follow-up suggestions

deep-research

NPX Install

Tags

SKILL.md Content (Chinese)

Deep Research (Multi-agent Orchestration Workflow for Deep Research)

Task Objectives

Delivery Standards

Task Scale Classification and Execution Path

End-to-end Process (Strictly Follow the Order)

Notes

Claude Code Non-interactive Mode Reference

Basic Usage

Sub-process Scheduling Template

Parallel Execution Example

General Experience and Best Practices

Thinking and Writing Guidelines

Pre-delivery Self-check List

Directory Structure Check

Process Compliance Check

Report Quality Check

Quick Failure Check