Additional Resources
- For canonical schema catalog: references/schema-catalog.md
- For CLI reference, REPL commands, server modes, and runtime controls: references/cli-reference.md
- For legacy parity tracking: references/legacy-parity-matrix.md
- For optimization quality gate: references/optimization-checklist.md
- For HTTP server guide: references/server-guide.md
Quick Start CLI (Do This First)
Use these commands first to avoid guessing behavior or schema:
bash
# Single query
idasql -s database.i64 -q "SELECT * FROM welcome"
# Interactive REPL
idasql -s database.i64 -i
# Long-lived HTTP server for iterative analysis
idasql -s database.i64 --http 8081
# Query over HTTP
curl -X POST http://127.0.0.1:8081/query -d "SELECT name, size FROM funcs LIMIT 5"
Critical guardrails:
- Always provide ( / ).
- Use when you want edits persisted on exit.
- Discover schema before writing queries:
- REPL:
- SQL:
PRAGMA table_xinfo(<table>);
(or PRAGMA table_info(<table>);
)
- Start orientation with .
Schema Catalog (Canonical)
Canonical table/view formats live in
references/schema-catalog.md
.
- Source of truth for column shapes and owner skill mapping.
- Sourced from SQL metadata ( + ).
- Use this before assuming column names for less-common surfaces.
- Legacy parity tracker:
references/legacy-parity-matrix.md
- Optimization quality gate:
references/optimization-checklist.md
Manual refresh:
SELECT schema, name, type, ncol FROM pragma_table_list WHERE schema='main' ORDER BY type, name;
PRAGMA table_xinfo(<surface>);
- Update
references/schema-catalog.md
owner mapping when surfaces change.
Session Bootstrap Contract
Use this exact startup flow before deep analysis:
- Connect to database (, , or ).
- Run orientation query:
- Validate key entities exist:
sql
SELECT COUNT(*) AS funcs FROM funcs;
SELECT COUNT(*) AS xrefs FROM xrefs;
SELECT COUNT(*) AS strings FROM strings;
- Introspect schema for target surfaces before authoring complex SQL:
sql
PRAGMA table_xinfo(funcs);
PRAGMA table_xinfo(xrefs);
- Route to domain skill using routing matrix below.
Never skip steps 2-4 when the user prompt is broad or ambiguous.
Global Agent Contracts
These contracts apply across all idasql skills and should be treated as one shared agent behavior model.
Read-First Contract
- Read current state first () before writes (//).
- Confirm target precision using stable identifiers (, , , ).
Anti-Guessing Contract
- Do not assume columns/types for long-tail surfaces.
- Introspect via or before issuing uncertain queries.
Mandatory Mutation Loop
- Read current state.
- Apply mutation.
- Refresh if needed ( for decompiler surfaces).
- Re-read and verify expected change.
Performance Contract
- Always constrain high-cost surfaces (, , , ) by key columns.
- For decompiler surfaces, enforce unless explicitly asked for broad scans.
Failure Recovery Contract
- On : introspect schema and retry.
- On empty results: validate address range, table freshness (), and runtime capabilities.
- On timeout: narrow scope, add constraints, paginate, or split query.
Skill Routing Matrix (Intent -> Skill)
Use this deterministic mapping for initial routing:
| User intent | Primary skill | Typical first query |
|---|
| "what does this binary do?" / triage | | |
| disassembly, segments, instructions | | SELECT * FROM funcs LIMIT 20;
|
| xrefs/callers/callees/import dependencies | | SELECT * FROM xrefs WHERE to_ea = ...;
|
| find functions/types/labels/members by name pattern | | SELECT name, kind, address FROM grep WHERE pattern = 'main' LIMIT 20;
|
| strings/bytes/pattern search | | SELECT * FROM strings LIMIT 20;
|
| decompile/pseudocode/ctree/lvars | | |
| comments/renames/retyping/bookmarks | | on target row before update |
| type creation/struct/enum/member work | | SELECT * FROM types LIMIT 20;
|
| breakpoints/patching | | SELECT * FROM breakpoints;
|
| persistent key/value notes | | SELECT * FROM netnode_kv LIMIT 20;
|
| SQL function lookup/signature recall | | SELECT * FROM pragma_function_list;
|
| live IDA UI context questions | | SELECT get_ui_context_json();
(when available) |
| IDA SDK-only logic not in SQL surfaces | | PRAGMA idasql.enable_idapython = 1; SELECT idapython_snippet('print(...)');
|
| recursive source/structure recovery | | start from function + recurse/handoff |
When prompts span domains, execute in this order:
- Orientation in
- Primary domain skill
- Adjacent skills for enrichment (for example + + )
Cross-Skill Execution Recipes
Recipe: Unknown binary triage -> suspicious function deep dive -> annotate
- : identify candidates from imports/strings/call patterns.
- /: map call graph and call sites.
- : inspect logic and variable semantics.
- : apply comments/renames/types with mutation loop.
Recipe: String IOC -> reference graph -> patch
- : locate candidate strings and addresses.
- : map references to caller functions.
- or : patch or annotate specific sites.
Recipe: Type recovery from pseudocode
- : inspect lvars, call args, and ctree patterns.
- : create/refine structs/enums and apply declarations.
- : finalize naming/comments and verify rendered pseudocode.
UI Context Routing
For prompts like "what am I looking at?", "what's selected?", "what is on the screen?", "look at what I'm doing", or references to "this/current/that", use the dedicated
skill.
- capture/reuse policy
- temporal reference rules ( vs )
- response template, examples, and fallback messaging
Runtime caveat:
- is plugin GUI runtime only, not idalib/CLI mode.
- If unavailable, state that UI context is unavailable and continue with non-UI SQL workflows.
welcome
Database orientation surface for quick session metadata.
This is metadata-only and not a replacement for UI context capture.
| Column | Type | Description |
|---|
| TEXT | One-line database summary |
| TEXT | Processor/module name |
| INT | 1=64-bit database, 0=32-bit |
| TEXT | Minimum address in database |
| TEXT | Maximum address in database |
| TEXT | Entry/start address |
| TEXT | Entry symbol name (if known) |
| INT | Number of detected functions |
| INT | Number of segments |
| INT | Number of named addresses |
| INT | Current IDA string-list count |
For canonical schema and owner mapping, see
references/schema-catalog.md
.
What is IDA and Why SQL?
IDA Pro is the industry-standard disassembler and reverse engineering tool. It analyzes compiled binaries (executables, DLLs, firmware) and produces:
- Disassembly - Human-readable assembly code
- Functions - Detected code boundaries with names
- Cross-references - Who calls what, who references what data
- Types - Structures, enums, function prototypes
- Decompilation - C-like pseudocode (with Hex-Rays plugin)
IDASQL exposes all this analysis data through SQL virtual tables, enabling:
- Complex queries across multiple data types (JOINs)
- Aggregations and statistics (COUNT, GROUP BY)
- Pattern detection across the entire binary
- Scriptable analysis without writing IDA plugins or IDAPython scripts
Core Concepts for Binary Analysis
Addresses (ea_t)
Everything in a binary has an
address - a memory location where code or data lives. IDA uses
(effective address) as unsigned 64-bit integers. SQL shows these as integers; use
for hex display.
Address-taking SQL functions accept:
- integer EA values (preferred for deterministic scripts)
- numeric strings (, )
- symbol names resolved with
get_name_ea(BADADDR, name)
(global names)
Quoted numeric strings are for address-taking scalar functions. For table
predicates, compare address columns to integer EAs such as
.
Examples:
sql
SELECT decompile('DriverEntry');
SELECT set_type('DriverEntry', 'NTSTATUS DriverEntry(PDRIVER_OBJECT, PUNICODE_STRING);');
SELECT (SELECT comment FROM comments WHERE address = 0x401000 LIMIT 1);
Read address comments from the
table after resolving the target EA.
If a symbol cannot be resolved, SQL functions return an explicit error like:
Could not resolve name to address: <name>
.
Local label lookup that depends on a specific
context is not consulted by default (
resolution). Use explicit numeric EAs when needed.
Functions
IDA groups code into functions with:
- / - Where the function begins
- - Where it ends
- - Assigned or auto-generated name (e.g., , )
- - Total bytes in the function
There will be addresses and disassembly listing not belonging to a function. IDASQL can still get the bytes, disassembly listing ranges, etc.
For single-EA disassembly (code or data), prefer
over function-scoped queries.
Cross-References (xrefs)
Binary analysis is about understanding relationships:
- Code xrefs - Function calls, jumps between code
- Data xrefs - Code reading/writing data locations, or data referring to other data (pointers)
- -> represents "address X references address Y"
Use table:
xrefs(from_ea, to_ea, type, is_code)
.
Segments
Use table:
segments(start_ea, end_ea, name, class, perm)
.
Memory is divided into segments with different purposes. For example, a typical PE file, has these segments:
- - Executable code (typically)
- - Initialized global data
- - Read-only data (strings, constants)
- - Uninitialized data
Of course, segment names and types can vary. You may query the
table to understand memory layout.
Basic Blocks
Within a function, basic blocks are straight-line code sequences:
- No branches in the middle
- Single entry, single exit
- Useful for control flow analysis
Use table:
blocks(start_ea, end_ea, func_ea, size)
.
Decompilation (Hex-Rays)
The Hex-Rays decompiler converts assembly to C-like pseudocode:
- ctree - The Abstract Syntax Tree of decompiled code
- lvars - Local variables detected by the decompiler
- Much easier to analyze than raw assembly
Core decompiler surfaces:
- (PRIMARY read/display surface)
- Returns the entire function as one text block.
- Each output line is prefixed for address grounding:
- Addressed line:
- Non-anchored line: (no address anchor for that line)
- Use this first when the user asks to "decompile", "show code", "show pseudocode", or "explain function logic".
- table (structured/edit surface)
- Use for line-level filtering (, , ) and comment writes keyed by .
- Resolve a writable pseudocode anchor first; do not assume .
- Not the preferred display surface for full-function code.
- and for AST-level analysis
- for local variable rename/type/comment updates
Performance Rules
CRITICAL: Constraint Pushdown
Some tables have optimized filters that use efficient IDA SDK APIs:
| Table | Optimized Filter | Without Filter |
|---|
| | O(all instructions) - SLOW |
| | O(all blocks) |
| or | O(all xrefs) |
| | Decompiles ALL functions |
| | Decompiles ALL functions |
Always filter decompiler tables by !
Use Integer Comparisons
sql
-- SLOW: String comparison
WHERE mnemonic = 'call'
-- FAST: Integer comparison
WHERE itype IN (16, 18) -- x86 call opcodes
O(1) Random Access
sql
-- SLOW: O(n) - sorts all rows
SELECT address FROM funcs ORDER BY RANDOM() LIMIT 1;
-- FAST: O(1) - direct index access
SELECT address
FROM funcs
WHERE rowid = ABS(RANDOM()) % (SELECT COUNT(*) FROM funcs);
CTE-First Mutation Workflow
For instruction lifecycle edits, use a CTE to identify precise targets first, then mutate:
sql
WITH target AS (
SELECT address
FROM instructions
WHERE func_addr = 0x401000
ORDER BY address DESC
LIMIT 1
)
DELETE FROM instructions
WHERE address IN (SELECT address FROM target);
SELECT make_code_range(address, end_ea) FROM funcs WHERE address = 0x401000;
This keeps mutation scope explicit and predictable for both humans and agents.
Summary: When to Use What
| Goal | Table/Function |
|---|
| List all functions | |
| Functions by return type | funcs WHERE return_is_integral = 1
|
| Functions by arg count | funcs WHERE arg_count >= N
|
| Void functions | funcs WHERE return_is_void = 1
|
| Pointer-returning functions | funcs WHERE return_is_ptr = 1
|
| Functions by calling convention | funcs WHERE calling_conv = 'fastcall'
|
| Find who calls what | with |
| Find data references | with |
| Analyze imports | |
| Find strings | |
| Configure string types | rebuild_strings(types, minlen)
|
| Instruction analysis | instructions WHERE func_addr = X
|
| Recreate deleted instructions | , make_code_range(start, end)
|
| Create function at EA | INSERT INTO funcs(address) VALUES (...)
|
| View function disassembly | or |
| View decompiled code | |
| UI/screen context questions | skill (, plugin UI only) |
| Edit decompiler comments | Resolve writable anchor, then UPDATE pseudocode SET comment = '...' WHERE func_addr = X AND ea = Y
|
| AST pattern matching | ctree WHERE func_addr = X
|
| Call patterns | , |
| Control flow | , |
| Return value analysis | |
| Functions returning specific values | ctree_v_returns WHERE return_num = 0
|
| Pass-through functions | ctree_v_returns WHERE returns_arg = 1
|
| Wrapper functions | ctree_v_returns WHERE returns_call_result = 1
|
| Variable analysis | ctree_lvars WHERE func_addr = X
|
| Type information | , |
| Function signatures | (with type classification) |
| Functions by return type | types_func_args WHERE arg_index = -1
|
| Typedef-aware type queries | (surface vs resolved) |
| Hidden pointer types | types_func_args WHERE is_ptr = 0 AND is_ptr_resolved = 1
|
| Manage breakpoints | (full CRUD) |
| Modify segments | (INSERT/UPDATE/DELETE) |
| Rename decompiler labels | UPDATE ctree_labels SET name=... WHERE func_addr=... AND label_num=...
|
| Delete instructions | (DELETE converts to unexplored bytes) |
| Recreate instructions | , |
| Bulk patch from file bytes | load_file_bytes(path, file_offset, address, size[, patchable])
|
| EA to physical offset mapping | on mapped byte rows ( means no file offset) |
| Create types | (INSERT struct/union/enum) |
| Add struct members | (INSERT) |
| Add enum values | (INSERT) |
| Modify database | , , , (INSERT/UPDATE/DELETE) |
| Store custom key-value data | (full CRUD, persists in IDB) |
| Entity search (structured) | skill + grep WHERE pattern = '...'
|
Remember: Always use
constraints on instruction and decompiler tables for acceptable performance.
Error Handling
- No Hex-Rays license: Decompiler tables (, , ) will be empty or unavailable
- No constraint on decompiler tables: Query will be extremely slow (decompiles all functions)
- Invalid address: Containing-function table lookups return no row; use a scalar subquery when you need a nullable scalar result
- Missing function: JOINs may return fewer rows than expected