ToolUniverse General Usage Strategies
Master strategies for using ToolUniverse's 10000+ scientific tools effectively. These principles apply regardless of how you access ToolUniverse (MCP server, SDK, or direct tool calls).
Core Philosophy
ToolUniverse has MANY tools - the challenge is discovering and using them effectively:
- Search widely - Don't assume you know all relevant tools; always search for more
- Query multiple databases - Cross-reference data across sources
- Multi-hop persistence - Many answers require 3-5 tool calls in sequence
- Never give up easily - If one tool fails, try alternatives
- Comprehensive reports - Use all available data; detail is valuable
- English-first queries - Always use English terms in tool calls, even if the user writes in another language
Step 0: Clarify the Request Before Researching
CRITICAL: Before starting any research, ensure you understand what the user actually needs. Wasted tool calls on the wrong entity or scope are expensive.
When to Ask Clarifying Questions
| Signal | Example | What to Clarify |
|---|
| Vague entity | "Research cancer" | Which cancer type? Which aspect (treatment, genetics, epidemiology)? |
| Ambiguous name | "Tell me about JAK" | JAK1/2/3? The gene family? A specific inhibitor? |
| Unclear scope | "Look into metformin" | Drug profile? Repurposing? Safety? Mechanism? |
| Missing context | "What targets this?" | Which compound/disease/pathway? |
| Multiple interpretations | "ACE" | ACE the gene? ACE inhibitors? ACE2? |
When NOT to Ask
Proceed directly when the request is specific enough:
- "What is the structure of EGFR kinase domain?" - Clear entity + clear data type
- "Find all drugs targeting BRAF V600E" - Specific variant + clear task
- "Research Alzheimer's disease comprehensively" - Broad but unambiguous
Clarification Checklist
Before starting research, confirm you know:
- Entity - Exactly which gene/protein/drug/disease?
- Species - Human unless stated otherwise
- Scope - Comprehensive profile or specific aspect?
- Output - Report, data table, quick answer, or comparison?
If any of these are unclear, ask the user one concise question covering all ambiguities rather than asking multiple rounds of questions.
Strategy 1: Exhaustive Tool Discovery
CRITICAL: ToolUniverse has 10000+ tools. Before any research task, search for ALL relevant tools.
Tool Discovery Methods
Use the tool finder tools to discover what's available:
| Method | Tool | Best For |
|---|
| Keyword | | Fast search by terms |
| LLM-based | | Intelligent matching by description |
| Embedding | | Semantic similarity search |
Discovery Best Practices
| Practice | Why | Example |
|---|
| Search with multiple terms | Same data from different angles | "protein expression", "gene expression", "tissue expression" |
| Search by database name | Find all tools for a source | "UniProt", "ChEMBL", "OpenTargets" |
| Search by data type | Comprehensive coverage | "variant", "mutation", "SNP", "polymorphism" |
| Search by use case | Task-oriented discovery | "druggability", "target validation" |
Minimum Discovery Queries
Before any research task, run at least these searches:
- Main topic query:
- Related terms: ,
- Database-specific:
[relevant database names]
- Data type specific:
Example for target research:
- "protein information"
- "gene expression"
- "drug target"
- "UniProt", "OpenTargets"
- "protein interaction"
- "variant mutation"
Strategy 2: Multi-Hop Tool Chains
CRITICAL: Most scientific questions require multiple tool calls. A single tool rarely gives the complete answer.
Why Multi-Hop Matters
| Question Type | Single Tool Answer | Multi-Hop Answer |
|---|
| "Tell me about EGFR" | Basic protein info | Full profile with structure, expression, drugs, variants, literature |
| "What drugs target TP53?" | List of drug names | Drug details, mechanisms, clinical trials, bioactivity data |
| "Research Alzheimer's" | Disease definition | Genes, pathways, drugs, trials, phenotypes, GWAS, literature |
Common Multi-Hop Patterns
Pattern A: ID Resolution Chain
Name → ID → Data → Related Data
Example: Gene name to complete profile
1. gene_name → Ensembl ID
2. Ensembl ID → UniProt accession
3. UniProt accession → Protein entry
4. UniProt accession → Domains
5. UniProt accession → Structure
Pattern B: Cross-Database Enrichment
Primary Data → Cross-reference → Enriched View
Example: Drug compound enrichment
1. drug_name → PubChem CID
2. drug_name → ChEMBL ID
3. CID → properties
4. ChEMBL ID → bioactivity
5. ChEMBL ID → targets
6. SMILES → ADMET predictions
Pattern C: Network Expansion
Seed Entity → Connected Entities → Entity Details
Example: Target interaction network
1. gene → protein interactions
2. For each interactor → gene info
3. Interactor → disease associations
Pattern D: Literature + Data Integration
Database Annotations → Literature Evidence → Synthesis
Example: Disease mechanism research
1. disease → associated genes
2. disease → phenotypes
3. disease → drugs
4. disease → literature
5. key papers → citations
Multi-Hop Persistence Rules
- Don't stop at first result - One tool gives partial data; keep going
- Follow cross-references - Use IDs from one tool to query others
- Chain until complete - 5-10 tool calls for comprehensive answers is normal
- Track all IDs - Save every identifier for potential future use
Strategy 3: Query Multiple Databases for Same Data
CRITICAL: Different databases have different coverage. Query ALL relevant sources.
Database Redundancy Principle
For any data type, query multiple sources:
| Data Type | Primary | Secondary | Tertiary |
|---|
| Protein info | UniProt | Proteins API | NCBI Protein |
| Gene expression | GTEx | Human Protein Atlas | ArrayExpress |
| Drug targets | ChEMBL | DGIdb | OpenTargets |
| Variants | gnomAD | ClinVar | OpenTargets |
| Literature | PubMed | Europe PMC | OpenAlex |
| Pathways | Reactome | KEGG | WikiPathways |
| Structures | RCSB PDB | PDBe | AlphaFold |
| Disease associations | OpenTargets | ClinVar | GWAS Catalog |
Merge Results Strategy
When querying multiple databases:
- Collect all results - Don't stop at first success
- Note data source - Track where each datum came from
- Handle conflicts - Document when sources disagree
- Prefer curated - Weight RefSeq over GenBank, UniProt over predictions
Strategy 3.1: Abstract Search vs Full-Text Search (Literature)
CRITICAL: Many biomedical “needle” terms (rsIDs like
, reagent catalog numbers, supplementary-table IDs) never appear in titles/abstracts. If you only search abstracts, you will miss papers even when they are open access.
Quick rule
- If your keywords look like body-only terms (rsIDs, figure/table references, “Supplementary Table”), use full-text-aware tools first.
Tools that can match full text (indexed or retrieved)
| Goal | Tools | Notes |
|---|
| Indexed full-text search (biomed OA) | | NCBI “pmc” database indexes full text; good for rsIDs. |
| Indexed full-text search (Europe PMC subset) | EuropePMC_search_articles
with + | Uses Europe PMC + fielded queries; works only when Europe PMC has indexed full text. |
| Best-effort full-text retrieval + keyword snippets | EuropePMC_get_fulltext_snippets
| Fetches full text (XML → HTML fallbacks) and returns bounded snippets with . |
| OA aggregation + (sometimes) full-text search | | Coverage varies; a paper may not exist in CORE even if OA elsewhere. |
| Download-and-scan fallback | CORE_get_fulltext_snippets
| Local PDF scan for body-only terms when index-based search misses; can fail if the “PDF” URL returns HTML/403 (check trace/content-type). |
| Partial full-text indexing (not guaranteed) | / openalex_literature_search
with / | Only matches works where OpenAlex has indexed full text; can miss PMC-hosted full text. Use as a secondary signal. |
Recommended flow for body-only keywords
- Try and
EuropePMC_search_articles
(with + ).
- If you have a PMCID/PMID, use
EuropePMC_get_fulltext_snippets
to confirm the term is in the paper.
- If you only have a PDF URL, use
CORE_get_fulltext_snippets
as a last resort, and treat HTTP as “request succeeded”, not “PDF succeeded” (validate ).
Strategy 4: Disambiguation First
CRITICAL: Before any research, resolve entity identity to avoid wrong data and missed results.
Why Disambiguation Matters
| Problem | Example | Consequence |
|---|
| Naming collision | "JAK" = Janus kinase OR "just another kinase" | Wrong papers retrieved |
| Multiple IDs | Gene has symbol, Ensembl, Entrez, UniProt IDs | Miss data in some databases |
| Salt forms | Metformin vs metformin HCl (different CIDs) | Incomplete compound data |
| Species ambiguity | BRCA1 in human vs mouse | Wrong expression/function data |
Disambiguation Workflow
Step 1: Establish Canonical IDs
gene_name → UniProt, Ensembl, NCBI Gene, ChEMBL target
compound_name → PubChem CID, ChEMBL ID, SMILES
disease_name → EFO ID, ICD-10, UMLS CUI
Step 2: Gather Synonyms
All aliases, alternative names, historical names
Step 3: Detect Naming Collisions
Search "[TERM]"[Title] → check if results are on-topic
Build negative filters: NOT [collision_term]
Step 4: Species Confirmation
Verify organism is correct (default: Homo sapiens)
ID Types by Entity
Genes/Proteins:
- Gene Symbol (EGFR, TP53)
- UniProt accession (P00533)
- Ensembl ID (ENSG00000146648)
- NCBI Gene ID (1956)
- ChEMBL Target ID (CHEMBL203)
Compounds:
- PubChem CID (2244)
- ChEMBL ID (CHEMBL25)
- SMILES string
- InChI/InChIKey
Diseases:
- EFO ID (EFO_0000249)
- ICD-10 code (G30)
- UMLS CUI (C0002395)
- SNOMED CT
Strategy 5: Never Give Up on Search
CRITICAL: When a tool fails or returns empty, don't give up. Try alternatives.
Failure Handling Protocol
Attempt 1: Primary tool
↓ fails
Wait briefly, then retry
↓ fails
Try fallback tool #1
↓ fails
Try fallback tool #2
↓ fails
Document as "unavailable" with reason
Common Fallback Chains
| Primary Tool | Fallback Options |
|---|
| PubMed citations | EuropePMC citations → OpenAlex citations |
| GTEx expression | Human Protein Atlas expression |
| PubChem compound lookup | ChEMBL search → SMILES-based lookup |
| ChEMBL bioactivity | PubChem bioactivity summary |
| DailyMed drug labels | PubChem drug label info |
| UniProt protein entry | Proteins API |
Alternative Search Strategies
If keyword search fails:
- Try synonyms and aliases
- Use broader/narrower terms
- Try different databases
If database is empty:
- Query related databases
- Use literature to find mentions
- Check if entity exists under different name
If API rate-limited:
- Wait and retry
- Try same query on different database
- Use cached results if available
Strategy 6: Generate Comprehensive Reports
CRITICAL: With access to many tools, reports should be detailed and thorough.
Report-First Approach
- Create report structure FIRST - Define all sections before gathering data
- Progressively update - Fill sections as data is gathered
- Show findings, not process - Report results, not search methodology
Citation Requirements
Every fact must have a source:
## Protein Function
EGFR is a receptor tyrosine kinase that regulates cell growth.
*Source: UniProt (P00533)*
### Expression Profile
| Tissue | TPM | Source |
|--------|-----|--------|
| Skin | 156.3 | GTEx |
| Lung | 98.4 | GTEx |
Evidence Grading
Grade claims by evidence strength:
| Tier | Symbol | Description | Example |
|---|
| T1 | ★★★ | Mechanistic with direct evidence | CRISPR KO study |
| T2 | ★★☆ | Functional study | siRNA knockdown |
| T3 | ★☆☆ | Association/screen hit | GWAS, high-throughput screen |
| T4 | ☆☆☆ | Review mention, text-mined | Review article |
In report:
ATP6V1A drives lysosomal acidification [★★★: PMID:12345678].
It has been implicated in cancer metabolism [★☆☆: TCGA data].
Mandatory Completeness
All sections must exist, even if "data unavailable":
## Pathogen Involvement
No pathogen interactions identified in literature or databases.
*Source: Literature search, UniProt annotations*
Report Quality Metrics
| Quality | Description | Tool Calls | Sections |
|---|
| Excellent | Multi-database, evidence-graded | 30+ | All mandatory, detailed |
| Good | Cross-referenced, sourced | 15-30 | All mandatory, adequate |
| Adequate | Single-database focus | 5-15 | Core sections only |
| Poor | Single tool, no sources | <5 | Incomplete |
Strategy 7: Use Specialized Skills for Specific Tasks
CRITICAL: For specific research tasks, use specialized skills (not this general skill).
Task-Specific Skill Selection
| Task | Recommended Skill |
|---|
| Data Retrieval | |
| Chemical compounds | tooluniverse-chemical-compound-retrieval
|
| Expression data | tooluniverse-expression-data-retrieval
|
| Protein structure | tooluniverse-protein-structure-retrieval
|
| Sequence retrieval | tooluniverse-sequence-retrieval
|
| Research & Profiling | |
| Disease research | tooluniverse-disease-research
|
| Drug profiling | tooluniverse-drug-research
|
| Literature review | tooluniverse-literature-deep-research
|
| Target analysis | tooluniverse-target-research
|
| Clinical Decision Support | |
| Drug safety analysis | tooluniverse-pharmacovigilance
|
| Precision oncology treatment | tooluniverse-precision-oncology
|
| Rare disease diagnosis | tooluniverse-rare-disease-diagnosis
|
| Variant interpretation | tooluniverse-variant-interpretation
|
| Discovery & Design | |
| Small molecule binder discovery | tooluniverse-binder-discovery
|
| Drug repurposing | |
| Protein therapeutic design | tooluniverse-protein-therapeutic-design
|
| Outbreak Response | |
| Infectious disease analysis | tooluniverse-infectious-disease
|
| Infrastructure & Development | |
| ToolUniverse installation/setup | |
| Python SDK for AI scientist systems | |
When to Use This General Skill
Use this skill when:
- Need general guidance on ToolUniverse usage
- Task doesn't fit a specialized skill
- Need to combine multiple specialized workflows
- Exploring what's possible with ToolUniverse
- Learning ToolUniverse best practices
Strategy 8: Parallel Execution for Speed
CRITICAL: Run independent queries simultaneously for faster research.
When to Parallelize
| Parallel | Sequential |
|---|
| Different databases for same entity | Tool B needs output from Tool A |
| Multiple entities, same data type | Building an ID → using the ID |
| Independent research paths | Iterating through a list of results |
Parallel Research Paths Example
For target research, run these 8 paths simultaneously:
- Identity - Names, IDs, sequence
- Structure - 3D structure, domains
- Function - GO terms, pathways
- Interactions - PPI network
- Expression - Tissue expression
- Variants - Genetic variation
- Drugs - Known drugs, druggability
- Literature - Publications, trends
Strategy 9: Iterative Completeness Check
CRITICAL: After gathering data, always ask "What else is missing?" to ensure comprehensive coverage.
The Completeness Loop
Gather initial data
↓
Review what you have
↓
Ask: "What aspects are still missing?"
↓
Identify gaps
↓
Search for tools to fill gaps
↓
Gather additional data
↓
Repeat until comprehensive
Universal Completeness Questions
After each research phase, ask:
- Identity: Do I have all relevant identifiers and names?
- Core data: Do I have the fundamental information for this entity type?
- Context: Do I have surrounding/related information?
- Relationships: Do I know what this connects to?
- Variations: Do I know about variants, forms, or subtypes?
- Evidence: Do I have supporting data from multiple sources?
- Literature: Do I have recent publications on this topic?
- Gaps: Have I documented what's unavailable?
Gap-Filling Strategies
| Gap Identified | Strategy |
|---|
| Missing data type | Search for tools with that data type |
| Single source only | Query additional databases |
| Outdated information | Check literature for recent updates |
| No experimental data | Look for predictions/computational data |
| Conflicting data | Find authoritative/curated sources |
| Shallow coverage | Dive deeper with specialized tools |
When to Stop
Stop the completeness loop when:
- All relevant aspects have been addressed (even if "not found")
- Multiple sources queried for key data
- Gaps are documented, not ignored
- No obvious missing pieces remain
Self-Review Questions
Before finalizing any research:
- Have I searched for ALL relevant tools?
- Have I queried multiple databases?
- Have I followed cross-references?
- Have I checked recent literature?
- Have I documented what's unavailable?
- Is there any obvious gap I haven't addressed?
- Would someone reading this ask "but what about X?"
Quick Reference: Tool Categories
Protein & Gene Tools
UniProt, Proteins API, MyGene, Ensembl tools
Structure Tools
RCSB PDB, PDBe, AlphaFold, InterPro tools
Drug & Compound Tools
ChEMBL, PubChem, DGIdb, ADMET-AI, DrugBank tools
Disease & Phenotype Tools
OpenTargets, ClinVar, GWAS, HPO tools
Expression Tools
GTEx, Human Protein Atlas, CELLxGENE tools
Variant Tools
gnomAD, ClinVar, dbSNP tools
Pathway Tools
Reactome, KEGG, WikiPathways, GO tools
Literature Tools
PubMed, EuropePMC, OpenAlex, SemanticScholar tools
Clinical Tools
ClinicalTrials.gov, FAERS, PharmGKB, DailyMed tools
Troubleshooting Common Issues
"Tool not found"
- Search for similar tools using Tool_Finder
- Check spelling of tool name
- Try alternative tools for same data type
"Empty results"
- Check spelling of query terms
- Try synonyms/aliases
- Try alternative databases
- Verify IDs are correct type
"Conflicting data"
- Note all sources
- Prefer curated databases
- Document the conflict in report
- Use evidence grading
"Incomplete picture"
- Search for more tools
- Query additional databases
- Follow cross-references
- Expand via literature
Strategy 10: English-First Tool Queries
CRITICAL: Most ToolUniverse tools only accept English terms. Always translate queries to English before calling tools, regardless of the user's language.
Language Handling Rules
- Default to English - All tool calls must use English search terms, entity names, and parameters
- Translate non-English input - If the user's question is in Chinese, Japanese, Korean, or any other language, translate the relevant scientific terms to English before making tool calls
- Respond in the user's language - While tools must be queried in English, deliver the final report/answer in the user's original language
- Fallback to original language - Only if an English search returns no results, retry with the original-language terms
- Check tool descriptions - A few tools may explicitly document multi-language support; use the original language only when the tool description says so
Examples
User (Chinese): "研究EGFR靶点"
→ Tool calls: use "EGFR", "epidermal growth factor receptor" (English)
→ Report: deliver in Chinese
User (Japanese): "メトホルミンの安全性プロファイル"
→ Tool calls: use "metformin", "safety profile" (English)
→ Report: deliver in Japanese
User (Korean): "알츠하이머병 관련 유전자"
→ Tool calls: use "Alzheimer's disease", "associated genes" (English)
→ Report: deliver in Korean
Why This Matters
| Scenario | Wrong Approach | Correct Approach |
|---|
| User asks in Chinese about "二甲双胍" | Pass "二甲双胍" to PubChem search | Translate to "metformin", search in English |
| User asks in Japanese about a disease | Pass Japanese disease name to OpenTargets | Translate to English disease name first |
| User asks in Spanish about a gene | Pass Spanish description to tool | Use standard gene symbol (e.g., TP53) |
Summary: The ToolUniverse Mindset
| Principle | Action |
|---|
| Clarify first | Confirm entity, scope, species, and output before researching |
| Search widely | 10000+ tools; always discover more |
| Multi-hop persistence | 5-10 tool calls per question is normal |
| Cross-reference | Query multiple databases for same data |
| Disambiguate first | Resolve IDs before research |
| Never give up | Fallbacks for every failure |
| Report comprehensively | Detail with sources and evidence grades |
| Use specialized skills | Apply domain-specific skills for focused tasks |
| Execute in parallel | Speed through concurrent execution |
| Check completeness | Ask "what's missing?" and fill gaps iteratively |
| English-first queries | Translate to English for tool calls; respond in user's language |
The goal: Transform 10000+ tools into comprehensive, reliable scientific intelligence.