cel-programs
When to use
Use this skill when tasks include:
- creating or editing agent stream templates
- configuring data stream manifests for the input type
- writing CEL programs with pagination, cursor management, or authentication
- testing or debugging a CEL program locally with mito
- setting up system tests with mock APIs for CEL-based data streams
- prototyping a new CEL-based data stream's collection logic
- any CEL or mito question, regardless of context
When not to use
Do not use this skill as the primary guide for:
- ingest pipeline processor design ()
- ECS field mapping ()
- package scaffolding ()
- system test execution with the Elastic stack ( →
references/system-testing.md
)
Mandatory workflow — mock → mito → template
This is not a suggestion. Every CEL program MUST be developed in this order. The subagent must not write
until the CEL program has been validated with mito against a running mock. Skipping steps or reordering causes failures that are hard to debug.
Do NOT write more than ~10–15 new lines of CEL before running mito. Build the program incrementally in phases (skeleton → error handling → event mapping → pagination → cursor guard), validating with mito after each phase. Writing a large program in one shot leads to cascading compilation errors that are extremely hard to debug. Follow the phased approach in
references/cel-incremental-build.md
.
| Step | Action | Output |
|---|
| 1. Create the system test mock | Write the config at _dev/deploy/docker/files/config-<stream>.yml
with rules matching all API endpoints. Write . | Mock config file, docker-compose service, test config |
| 2. Start the mock locally | stream http-server --addr=:8090 --config=...
| Running mock at |
| 3. Create a plain file and | Write the CEL program as a standalone file. Create with the same keys the future block will contain, but with literal test values instead of Handlebars. Point at the local mock. | , in or working dir |
| 4. Run mito and iterate | Build incrementally per references/cel-incremental-build.md
: Phase 0 skeleton → Phase 1 error handling → Phase 2 events → Phase 3 pagination → Phase 4 cursor. Run mito -data state.json -log_requests program.cel
after each phase. Do not proceed until mito output is correct. | Validated CEL program |
| 5. ONLY THEN write | Copy the working CEL expression into in the Handlebars template. Replace literal test values with references. Configure manifests. | Final integration template |
Step 3 detail — translating template vars to mito state: When the future
will have a
block like
and
batch_size: {{batch_size}}
, the
for mito testing uses the same key names with literal test values:
json
{
"url": "http://localhost:8090",
"api_key": "test-key",
"batch_size": 50,
"initial_interval": "24h"
}
This mirrors the runtime state the CEL input would provide. Add
to test subsequent-run behavior.
For the full mock-first workflow details, CLI flags, execution model, and quality standards: load
references/mito-reference.md
.
cel.yml.hbs template anatomy
The
file at
data_stream/<stream>/agent/stream/cel.yml.hbs
is a Handlebars template that renders the final CEL input configuration. It has these sections in order:
yaml
interval: {{interval}}
resource.tracer:
enabled: {{enable_request_tracer}}
filename: "../../logs/cel/http-request-trace-*.ndjson"
maxbackups: 5
{{#if proxy_url}}
resource.proxy_url: {{proxy_url}}
{{/if}}
{{#if ssl}}
resource.ssl: {{ssl}}
{{/if}}
{{#if http_client_timeout}}
resource.timeout: {{http_client_timeout}}
{{/if}}
resource.url: <constructed from vars>
state:
<credentials and pagination config from vars>
redact:
fields:
- <sensitive state keys>
max_executions: <number, for heavy pagination>
program: |
<CEL expression>
tags:
{{#if preserve_original_event}}
- preserve_original_event
{{/if}}
{{#each tags as |tag|}}
- {{tag}}
{{/each}}
{{#contains "forwarded" tags}}
publisher_pipeline.disable_host: true
{{/contains}}
{{#if processors}}
processors:
{{processors}}
{{/if}}
Handlebars patterns
| Pattern | Purpose |
|---|
| Direct variable substitution |
{{#if var_name}}...{{/if}}
| Conditional block for optional config |
| Iteration over list vars |
{{#contains "forwarded" tags}}
| Check if list contains value |
Key template fields
- — base URL, often constructed from multiple vars (e.g., )
- (ga 8.18.1) — static headers the same for every request (, , API version headers). Set here rather than in-program when headers never vary. Applied before auth headers.
- — block where manifest vars are injected as CEL state; credentials and pagination settings go here
- — list state keys containing secrets to redact from debug logs
- — override default 1000 for integrations with heavy pagination (e.g., 5000)
- — the CEL expression; must be a YAML literal block scalar
Do NOT set in integration packages
Integration packages (
) must
never include
in
or define a
manifest var. The framework automatically routes documents to the correct data stream. Setting
overrides this routing and causes documents to land in the wrong index — typically resulting in "0 hits" during system tests.
Only
input-type packages (
) use
because they have no predefined data streams.
Data stream manifest configuration
The data stream
defines the CEL input stream and its variables.
Standard vars every CEL stream should include
| Var | Type | Purpose |
|---|
| text | API base URL |
| text | Polling interval (e.g., ) |
| text | Lookback window on first run (e.g., ) |
| bool | Enable HTTP request tracing |
| text | Request timeout (e.g., ) |
| text | HTTP proxy URL |
| yaml | TLS configuration |
| text (multi) | Event tags |
| bool | Keep original event |
| yaml | Beat processors |
Auth-specific vars depend on the API (API key, OAuth client_id/secret/token_url, bearer token, etc.).
Declare
in the data stream manifest, not at the input level. Input-level tracing enables logging for all data streams in the policy.
Package-level vs data-stream-level vars
- Package-level vars in the root under
policy_templates[].inputs[].vars
: shared across streams (e.g., , auth credentials)
- Data-stream-level vars in
data_stream/<stream>/manifest.yml
under : stream-specific (e.g., , , )
Scope of the CEL program
The CEL program's responsibility is data collection only:
- Fetch data from the API endpoint(s)
- Handle pagination — walk through all pages within a single polling cycle
- Manage cursor state — store timestamps or page tokens in so the next polling interval resumes where the last one left off, avoiding re-collection of already-fetched events
- Emit raw events — output
{"message": e.encode_json()}
for each record
The CEL program does not handle:
- Elasticsearch-level deduplication — if overlapping time windows cause a few duplicate events to be collected, that is acceptable. The ingest pipeline or Elasticsearch routing handles dedup at index time, not the CEL program.
- Field mapping or transformation — the ingest pipeline handles parsing, ECS mapping, and enrichment.
- Filtering by content — unless the API supports server-side filtering parameters, do not filter events in the CEL program. Emit everything and let the pipeline decide.
Do not search the codebase for
,
, or deduplication patterns. These are not CEL concerns.
CEL program structure patterns
Pagination strategy selection
| API behavior | Pattern | Key indicators |
|---|
| Returns total count + supports offset | Offset pagination | , , in request/response |
| Returns records since a timestamp | Timestamp cursor | Time-range params, no explicit page tokens |
| Returns header with next URL | Link header | in response headers |
| Returns next-page URL in response body | Next-URL | , , field in JSON |
| GraphQL with | GraphQL cursor | , in object |
| Multi-phase subscription/content flow | Multi-step state machine | Multiple API calls with work queues in state |
Cursor timestamp selection — use the
last record's timestamp when the API sorts ascending;
first when descending;
with a regression guard when sort order is not guaranteed.
Full code, package references, and YAML snippets for each pattern:
references/cel-pagination-patterns.md
.
Authentication patterns
Three strategies:
header (credentials in
, passed via
map),
query parameter (credentials appended to URL via
),
signed query (HMAC signature computed in CEL). Config-level
/
/
applies to all requests including
;
/
applies only to direct calls (
,
). Prefer input-level auth over in-program token fetching.
For full code examples, optional-header syntax, and config-level auth scope details: load
references/cel-auth-patterns.md
.
State management rules
- is populated from config; must be preserved in output or hardcoded
- is the only state persisted across input restarts; store pagination positions and timestamps here
- array is removed after each evaluation; never rely on it in subsequent runs
- triggers immediate re-evaluation, but only if is non-empty. Pagination continuation guardrail: when a next-page cursor/token exists, always set regardless of how many events were collected on the current page. Tying to stalls pagination silently — the next cursor is valid, and an empty array is safe to emit. The correct pattern is
"want_more": next_cursor != ""
.
- All other state keys are retained within a session but lost on restart — use to propagate them automatically
- Numbers are serialized as floats in state JSON; cast with when using as integers
- Optional access with
state.?cursor.last_timestamp.orValue(default)
prevents errors when cursor is absent
- Secrets — every sensitive field in must have a corresponding entry. is always redacted automatically. When (ga 9.4.0) is available, prefer it.
- Cursor updates require a published event — the input only persists cursor updates when at least one event is published. If a program updates the cursor but returns zero events, the cursor change is lost.
- Do not duplicate request/response handling across branches — when an initialization branch (cursor creation, subscription, token exchange) and a steady-state branch both need the same fetch logic, consolidate it. Two approaches: split the init into a separate evaluation via (Technique 6 Variant A), or use an intermediate result map to unify the branches within one evaluation (Variant B). Both are valid — see
references/cel-code-style.md
Technique 6 and the init-then-steady-state pattern in references/cel-pagination-patterns.md
.
- Nesting depth — chain depth must not exceed 5 levels on any execution path. HTTP programs must target 2 levels inside ( + ). Cursor defaults, window bounds, and page tokens must be extracted as pre-bindings before . Single-use values such as must be inlined at the call site, not wrapped in . Load
references/cel-code-style.md
for flattening techniques and before/after examples.
Map merge and field removal
,
,
, and
are general-purpose map operations — they work on any map, not just state or cursor.
does a
shallow merge: nested objects are replaced entirely. This makes it a tool for cursor state transitions (omitting a sub-object removes it via clobber) as well as for building request headers, transforming response data, and constructing intermediate maps. Full semantics and examples:
references/cel-code-style.md
.
Event output format
Events must contain ONLY —
{"message": e.encode_json()}
. Do not set
or any other field; the framework adds
, and duplicates cause silent document rejection in ES 9.x. See
for correct/incorrect examples.
Response handling
— the
wrapper was required in older runtime versions but is no longer needed. Use
directly.
Error handling
Every CEL program must handle HTTP errors. Two forms:
- Single-object error (retry):
"events": {"error": {...}}
— logs at ERROR, sets degraded status, deletes the cursor so the next evaluation retries. Use when data was not collected.
- Array error (advance):
"events": [{"error": {...}}]
— cursor is updated. Use when the program should advance past the error. Requires a processor in the ingest pipeline (ES 8.16.0+).
Error message format:
"METHOD path: body-or-status"
. Code examples:
.
Placeholder events
When advancing the cursor with no real events, emit a placeholder (
) and add a
- drop_event.when.equals.retry: true
entry in the
section so it is discarded before indexing. Full pattern and alternatives:
.
Rate limiting and retry
Do NOT implement rate limiting or retry logic in the CEL program. No
calls, no
state propagation, no 429-specific branches, no retry loops. These add excessive nesting and complexity for marginal benefit.
When an API has a documented rate limit, use config-only YAML settings in
:
yaml
resource.rate_limit.limit: 10 # max requests per second
resource.rate_limit.burst: 5 # max burst above sustained rate
When custom retry behavior is needed, use config-only YAML settings:
yaml
resource.retry.max_attempts: 5 # default: 5
resource.retry.wait_min: 1s # default: 1s
resource.retry.wait_max: 60s # default: 60s
The input framework enforces both transparently. See
references/cel-rate-limiting.md
for guidance on when to add these settings.
Type safety
Avoid — defeats type checking. Rarely needed in practice.
All numbers are float64 — the CEL input transmits all numbers as
. Numbers >=1e7 render in scientific notation in Elasticsearch. Convert intended-integer fields to strings in the CEL program or via ingest pipeline. Safe integer range: [-(2^53 - 1), 2^53 - 1].
Debugging aids
- — logs to at DEBUG level.
- / — structured error handling without crashing the program.
- (ga 8.18.0) — full evaluation state dump on failure. Note: dumps may contain secrets.
- (ga 9.2) — how many evaluations remain in the budget.
Mito CLI
Mito (
) is the local CEL evaluation CLI.
A CEL program that has not been tested with mito is not acceptable. Follow the mandatory workflow at the top of this skill: mock → mito → template. Do NOT write
until the program passes mito validation.
For installation, CLI flags, input state structure, execution model, the full mock-first workflow steps, mito→integration mapping, and quality standards: load
references/mito-reference.md
.
Data anonymization
All data committed to the repository must be fully anonymized. This applies to default values in manifest vars (use
), example values in CEL state, mock API responses, pipeline test fixtures from CEL output, and sample payloads captured during mito prototyping.
Refer to the
skill for the full anonymization policy and placeholder conventions.
Handoff to other skills
- →
references/system-testing.md
to run system tests (the mock API is already in place from CEL development)
- →
references/pipeline-testing.md
to validate ingest pipeline behavior on CEL-produced events
- skill for overall package layout
Reference files
IMPORTANT: These reference files contain the actual working code examples and patterns. The summaries above are not sufficient to write correct CEL programs — you MUST load the relevant references before writing code.
Always load these five when building a CEL program — in this order (mock/mito before templates):
| File | Contains | Load order |
|---|
references/cel-system-tests.md
| Mock API setup with elastic/stream, docker-compose config, rule format, variable-capture patterns, GraphQL mock examples, hit_count calculation, and debugging 0-hits failures | 1st — you need this before writing any CEL |
references/cel-incremental-build.md
| Mandatory phased build ladder (skeleton → error handling → events → pagination → cursor), syntax anti-patterns that cause compilation failures (, , tuples, unbalanced parens), and debugging guidance | 2nd — you MUST follow this phased approach; do not write the full program before validating a skeleton |
references/mito-reference.md
| Mito CLI flags, input state structure, mock-first workflow, translating template vars to state.json, extension library quick-reference, syntax pitfalls, testscript harness | 3rd — you need this to develop and validate the program |
references/cel-template-examples.md
| Complete working examples (minimal GET, paginated timestamp cursor, OAuth, GraphQL cursor) with corresponding manifest configs — these are FINAL output; do not write templates until mito passes | 4th — only needed at step 5 of the workflow |
references/cel-code-style.md
| Nesting discipline: the 3-level HTTP core rule, six flattening/structuring techniques (including intermediate result maps for shared logic), shallow merge semantics, cursor namespacing with clobber, merge strategies (//), , and links to well-structured reference integrations — must read before writing any multi-line CEL | 5th — read this before writing your CEL program so structure is right from the start |
Load these based on the task:
| File | Load when |
|---|
references/cel-pagination-patterns.md
| Writing any pagination logic — all 6 patterns with code |
references/cel-auth-patterns.md
| Implementing authentication — header, query param, signed, and config-level auth patterns |
references/cel-rate-limiting.md
| Rate limiting policy — config-only approach, when to add and settings |
| Quick-reference for common idioms, HTTP request patterns, structure conventions |
references/cel-polymorphic-patterns.md
| Choosing between pure-CEL, mito lib, and config approaches for auth, headers, rate limiting — version-tagged |
references/cel-expression.md
| Expression-specific reference: interface contract, translation framing (Python→CEL), incremental build phases, core structure, event output, error handling, pagination, state management, syntax rules, quality checklist |
references/cel-taxonomy.md
| Taxonomy classification: pagination and state management classes, least-complexity principle, mapping to skill vocabulary, how to classify from test-api.py |
references/cel-complexity-baselines.md
| Per-pattern-class complexity baselines from a ceplx survey of 316 programs, skip threshold, reviewer challenge examples, diagnostic interpretation |
references/expression-builder-subagent-guidance.md
| Subagent operating manual for the cel-expression-builder: translates test-api.py into a validated file + taxonomy classification. Does not touch templates, manifests, or mocks. |
references/reviewer-subagent-guidance.md
| Subagent operating manual for the cel-expression-reviewer: checks generated CEL against complexity baselines and source fidelity, produces specific challenges or accepts |
references/cel-function-reference.md
| Looking up available CEL functions per extension and their first mito version |
references/builder-subagent-guidance.md
| Subagent operating manual for the cel-program-builder orchestrator: scope boundaries, skill-load sequence, the 9-step mock-first / mito-incremental workflow with mock completeness gate, delegation to cel-expression-builder, reporting contract. The orchestrator dispatches subagents by passing this file's path in the task prompt; the subagent reads it itself in its own fresh context. Do NOT embed/paste its contents into the task prompt. |