Deep Research (Deep Research Orchestration Workflow)
Treat "deep research" as a reusable, parallelizable production process: The main controller is responsible for clarifying goals, splitting sub-goals, scheduling child processes, aggregating and refining; child processes are responsible for collecting/extracting/local analyzing and outputting structured Markdown materials; the final deliverable must be an independent finished file rather than a chat post.
Key Constraints (Must Be Followed)
- Keep default model and configuration unchanged: Do not explicitly override or use additional to overwrite default model/inference settings; only adjust relevant configurations when explicitly authorized by the user.
- Default minimum permissions: Child processes run in
--sandbox workspace-write
by default; enable permissions like network access only when necessary. If a sub-task must execute commands that "require shell networking" (such as /), add -c sandbox_workspace_write.network_access=true
to .
- Network access prioritizes skills, then MCP: Prioritize using installed skills; if MCP must be used, prioritize , then ; consider / only when it's truly impossible to meet requirements with the above.
- Non-interactive friendly: Child processes do not use the plan tool, and do not interact with users in a "waiting for confirmation/feedback" manner; focus on file落地 and traceable logs.
- File delivery first: The final deliverable must be saved as an independent file; it is prohibited to post the complete draft in chat.
- Output decision and progress logs at each step: Especially during splitting, scheduling, aggregation, refinement, and before delivery.
Task Objectives
- Derive a set of parallel sub-goals from the user's high-level goals (such as link lists, data shards, module lists, time slices, etc.).
- Launch independent child processes for each sub-goal and assign appropriate permissions (default sandbox; enable network access if necessary).
- Execute in parallel and produce sub-reports (natural language Markdown, which can include sections/tables/lists); output error descriptions with reasons and follow-up suggestions if failures occur.
- Aggregate sub-outputs in order using scripts to generate a unified draft.
- Conduct sanity checks and minimal fixes on the draft, then provide the final artefact path and summary of key findings.
Delivery Standards
- Deliverables must be structured, insight-driven overall finished products; it is prohibited to directly splice sub-task Markdown as the final draft.
- When it is necessary to retain the original text of sub-tasks, save it as an internal file (e.g.,
.research/<name>/aggregated_raw.md
), and only absorb key insights/evidence in the finished product.
- Refinement and revision should be iterated chapter by chapter and paragraph by paragraph; do not delete the entire draft and rewrite it at once; check references, data and context after each modification to ensure traceability.
- Deliver detailed, in-depth analytical reports by default.
- Conduct "double inspection" before delivery:
- Check whether it is truly produced through "chapter-by-chapter, multi-round integration"; if it is generated in one go, return it to rewrite by chapters.
- Evaluate whether it is detailed enough; if it is too thin, first judge whether it is "insufficient materials from sub-tasks" or "over-compressed during finalization": the former drives supplementary/ additional research, while the latter continues to expand and refine based on existing materials until it meets the detailed standard.
End-to-End Process (Strictly Follow the Order)
-
Pre-execution Planning and Assessment (Mandatory; Completed by the Main Controller)
- First clarify goals, risks, resource/permission constraints, and identify core dimensions of subsequent diffusion dependencies (theme clusters, people/organizations, regions, time slices, etc.).
- If public directories/indexes (tab pages, API lists, etc.) exist, crawl and cache them in a minimal way and count entries; if not, conduct "desk research" to obtain real samples (news, materials, datasets, etc.), record sources/time/key points as evidence.
- Display at least one representative sample of real retrieval or browsing before forming the list; relying solely on experience speculation does not count as completing the assessment.
- During the assessment phase, must obtain real samples through a "traceable toolchain" at least once and record references: prioritize using installed skills; if MCP is needed, prioritize , then ; if neither is available, record the reason and choose an alternative solution (downgrade to minimal direct network crawling only when necessary).
- Output an initial (or draft) list: list the discovered dimensions, options and samples mastered in each dimension, scale estimation, and mark uncertainties/gaps. If no real samples have been obtained yet, complete the research first and prohibit proceeding to the next step.
- Complement the executable plan based on the above structure (splitting, scripts/tools, output format, permissions, timeout strategy, etc.), report dimension statistics and plan content in the user's language; wait until a clear "execute/start" response is received.
-
Initialization and Overall Planning
- Clarify goals, expected output format and evaluation criteria.
- Generate a semantic and non-repetitive name according to the current task (recommended format:
<YYYYMMDD>-<short-title>-<random-suffix>
, all lowercase, separated by hyphens, no spaces).
- Create a running directory , and save all products in this directory (subdirectories such as , , , , , ).
- Keep default model and configuration unchanged; obtain user consent first when needing to adjust any model/inference/permission-related settings, and note the reason for the change and scope of impact in the logs.
-
Sub-goal Identification
- Extract or construct a list of sub-goals through scripts/commands.
- When source data is insufficient (e.g., the page only provides two main links), record the reason truthfully, and then the main process directly takes over to complete the remaining work.
-
Generate Scheduling Script
- Create a scheduling script (e.g.,
.research/<name>/run_children.sh
), which requires:
- Receive the list of sub-goals (can be stored in JSON/CSV) and schedule them one by one.
- Construct calls for each sub-goal, recommended key points:
- Recommended form:
codex exec --full-auto --sandbox workspace-write ...
(refer to for details).
- State in the prompt: All networking requirements prioritize using installed skills (skill priority); if MCP must be used, prioritize , then ; use / only when it's truly impossible; do not use the plan tool and "manual interaction waiting".
- Do not pass unless required by the user, and do not use additional to overwrite default model/inference settings; consider adjusting only when explicitly authorized by the user and the result quality is indeed insufficient.
- Specify the output path for sub-results (e.g.,
.research/<name>/child_outputs/<id>.md
).
- Explicitly prohibit the use of deprecated parameters (such as , , ), and remind to run first to get the latest instructions. The following call template can be referenced (only demonstrates parameters, does not involve parallelism):
bash
timeout 600 codex exec --full-auto --sandbox workspace-write \
--output-last-message "$output_file" \
- <"$prompt_file"
- If child processes are allowed to execute commands that "require shell networking" (such as /), append:
-c sandbox_workspace_write.network_access=true
to the call.
- Set timeouts based on task scale: assign 5 minutes () for small tasks first, and relax to a maximum of 15 minutes () for larger tasks, with an external command as a fallback. When the 5-minute timeout is hit for the first time, judge whether to split/modify parameters and retry based on the actual task situation; if it still cannot be completed within 15 minutes, it is considered that the prompt/process needs to be investigated.
- For small-scale tasks (<8), use loops + background tasks (or queue control) to achieve parallelism, avoiding failures caused by command line length limits; for large-scale tasks, use /GNU Parallel, but first verify parameter expansion with a small scale. The default parallelism is 8, which can be adjusted according to hardware or quotas.
- Do not use "running one by one in series" to replace parallelism; do not bypass the established process by means such as "the main process searches casually".
- Capture the exit code of each child process and write logs to the running directory; use methods like
stdbuf -oL -eL codex exec … | tee .research/<name>/logs/<id>.log
to ensure real-time refreshing, which is convenient for observing progress with .
- Note that does not provide parameters such as and ; files need to be written through pipes, and the exit code should be confirmed with the correct index after multiple pipes. Review available parameters with before running.
- When data volume is sufficient, the main controller should try not to personally undertake heavy tasks such as downloading/parsing; assign these tasks to child processes, and the main controller focuses on prompt, template and environment preparation.
-
Design Child Process Prompt
- Dynamically generate a prompt template, which must include at least:
- Sub-goal description, input data, constraint boundaries.
- Limit the total number of rounds of network retrieval/extraction during the planning phase to no more than X (selected according to complexity; usually 10 is recommended), and converge when information is sufficient; tool priority: skills → MCP ( → ) → minimal direct crawling.
- Output results in natural language Markdown: including conclusions, key evidence lists, reference links; provide error descriptions and follow-up suggestions in Markdown format if errors occur.
- When generating actual prompt files, prioritize using /line-by-line writing to inject variables, avoiding the known issue of Bash 3.2 truncating variables in scenarios with multi-byte characters.
- Write the template to a file (e.g.,
.research/<name>/child_prompt_template.md
) for auditing and reuse.
- Before starting the scheduling script, quickly review the generated prompt files one by one (e.g.,
cat .research/<name>/prompts/<id>.md
), and dispatch tasks only after confirming that variable substitution is correct and instructions are complete.
-
Parallel Execution and Monitoring
- Run the scheduling script.
- Record the start/end time, duration and status of each child process.
- Make clear decisions on failed/timeout child processes: mark, retry, or explain in the final report; record that the prompt/process needs to be investigated when the 15-minute timeout limit is reached. During the execution of long tasks, users can be prompted to track real-time output with
tail -f .research/<name>/logs/<id>.log
.
-
Programmatic Aggregation (Generate Draft)
- Use a script (e.g.,
.research/<name>/aggregate.py
) to read all Markdown files under .research/<name>/child_outputs/
, and aggregate them in the preset order to generate an initial main document (e.g., .research/<name>/final_report.md
).
-
Interpret Aggregation Results and Design Structure
- Read through
.research/<name>/final_report.md
and key sub-outputs.
- Design the chapter outline of the refined report and "material mapping" (e.g.,
.research/<name>/polish_outline.md
), clarifying the target audience, chapter order and core arguments of each chapter.
-
Chapter-by-Chapter Refinement and Finalization
- Create a refined draft (e.g.,
.research/<name>/polished_report.md
), and write chapter by chapter according to the outline; self-check facts, references and language requirements immediately after finishing each chapter, and trace back to sub-drafts for verification if necessary.
- Avoid rewriting the entire draft at once; adhere to "chapter-by-chapter iteration" to maintain consistency and reduce the risk of omissions, while recording the highlights, problems and handling methods of each chapter.
- Uniformly organize duplicate information, citation formats, and items to be confirmed, while retaining core facts and quantitative data.
-
Delivery
- Confirm that the refined draft meets the delivery standards (complete structure, unified tone, accurate references), and use this finished product as the external report.
- The final deliverable must be saved as an independent file (located in ); report to the user by providing the file path and necessary summary, and it is prohibited to post the complete draft in chat.
- Outline core conclusions and actionable suggestions in the final reply; supplement follow-up methods for items to be confirmed if necessary.
- Do not attach intermediate drafts or internal notes externally to ensure that users see high-quality finished products.
Notes
- Keep the process idempotent: Generate a new for each run to avoid overwriting old files.
- All structured outputs must be valid UTF-8 text.
- Elevate permissions only when authorized or truly necessary; avoid using
--dangerously-bypass-approvals-and-sandbox
.
- Be cautious when cleaning up temporary resources to ensure that logs and outputs are traceable.
- Provide degradable explanations for failed processes: Attempt crawling tasks at least twice; if it still fails, add a section "Failure Reasons/Follow-up Suggestions" in Markdown to avoid blanks during aggregation.
- Cache first: Raw materials obtained through skills/MCP should be written to cache directories such as first, and local cache should be prioritized for subsequent processing to reduce repeated requests.
- Understand completely before summarizing: Process the complete original text before summarizing/extracting; do not mechanically truncate to a fixed length (e.g., the first 500 characters). You can write scripts for full-text parsing, key sentence extraction or key point generation, but do not rely on "hard truncation".
- Temporary directory isolation: Intermediate products (script logs, parsing results, cache, debugging outputs, etc.) are placed in subdirectories such as , , , and can be cleaned up as needed after the process ends.
- Search service priority: Prioritize using installed skills for networking operations; if MCP is needed, first check available MCP servers (e.g., run ), and prioritize , then ; fall back to minimal direct crawling capability when MCP is unavailable.
- MCP parameter and output control: For tools that may return excessively large results, avoid requesting fields like "raw full text" to prevent response bloat; if necessary, extract in segments, list directories first and then delve into details as needed.
- Image retrieval: If MCP supports image search/description, enable it and present image clues together with text evidence unless the user explicitly requires "plain text only".
General Experience and Best Practices
- Verify environment assumptions first: Before writing the scheduling script, use / to confirm that key paths (such as , resource directories) exist; if necessary, derive the warehouse root path with and pass it in as a parameter to avoid hardcoding.
- Make extraction logic configurable: Do not assume that web pages share the same DOM; parsing scripts should provide configurable selectors/boundary conditions/readability parsers, and only need to modify configurations when reused across sites.
- Run through a small scale first before parallelizing: Before full parallelism, run 1–2 sub-goals in series to verify agent configuration, skills/MCP toolchain and output path; increase concurrency only after confirming that the link is stable, avoiding "unable to see errors after taking off".
- Hierarchical logs for easy tracing: The scheduler writes to
.research/<name>/dispatcher.log
; sub-tasks write to .research/<name>/logs/<id>.log
separately, and directly the corresponding log to locate MCP/call details when failures occur.
- Failure isolation and retry: When parallel failures occur, first record the failed ID and logs, and prioritize retrying individual failed tasks; maintain a list and uniformly prompt follow-up suggestions during the final stage.
- Avoid repeated crawling: Before retrying, check whether
.research/<name>/child_outputs/<id>.md
already exists legally; skip if it exists to reduce quota consumption and repeated access.
- Final review and refinement: Before delivery, review whether the aggregated and refined draft meets language requirements (e.g., full Chinese if required), and check whether references and data points are consistent with the source files; do not lose key facts and quantitative information during refinement, so that the finished product has insights rather than just stacking facts.
- Present references in place: Directly add Markdown links to sources after each key point (e.g.,
[Source](https://example.com)
), avoiding concentrating links at the end of paragraphs for easy immediate verification.
- Coverage check script: After batch generation, use a lightweight script to count missing entries, empty fields or tag quantities to ensure that problems are discovered and remedied before reporting.
- Set boundary constraints for child processes: Clearly specify accessible scopes (only specified URLs/directories) and available tools in child prompts, reducing the risk of out-of-bounds and repeated crawling, and making the process safe and controllable on any site.
Thinking and Writing Guidelines
Think first, then act: Pursue in-depth, independent thinking, and insights that exceed expectations (but do not mention "surprise" in the answer); figure out why the user asks this question, what the underlying assumptions are, and whether there is a more essential way to ask it; at the same time, clarify the success criteria that your answer should meet, and then organize content around the criteria.
Maintain collaboration: Your goal is not to mechanically execute instructions, nor to force a definite answer when information is insufficient; but to advance together with the user, gradually approaching better questions and more reliable conclusions.
Writing style requirements:
- Do not overuse bullet points, limit them to the top level as much as possible; use natural language paragraphs when possible.
- Do not use quotation marks unless directly quoting.
- Maintain a friendly, easy-to-understand, rational and restrained tone when writing.
When executing this skill, output clear decision and progress logs at each step.