Elastic ML Anomaly Detection

Single skill covering all anomaly detection work against Kibana Agent Builder MCP at

{KIBANA_URL}/api/agent_builder/mcp

. Use the Mode Selector below to pick the right approach for the user's question — modes share the same tool surface and concepts.

Platform

Read path: ES|QL against

.ml-anomalies-*

.ml-config

.ml-notifications-*

.ml-annotations-*

Always-available:
```
platform.core.execute_esql
```
(plus additional platform tools for search, index mapping, and documentation — see
```
scripts/agent_builder_constants.json
```
)
ML API spec (if available):
```
.kibana_ai_openapi_spec_elasticsearch
```
— see references/anomaly-detection-openapi-spec-discover.md for discovery pattern.
Run
ad_validate_ml_tool_permissions
first when tools return empty/misleading results — missing privileges are the most common cause of false negatives. Full permissions matrix: references/permissions-matrix.md.

Mode Selector

User intent	Mode
"What broke?" / RCA / cross-job / blast radius / influencers / log categories	Investigate
"Why score high/low?" / renormalization / model bounds / forecasts	Explain
Missing docs / memory limit / datafeed stopped / CCS / lifecycle / calendars	Troubleshoot
Create a job / configure a datafeed / start analysis / retrieve results	Manage
Security framing (attack chains, MITRE, exfil)	Investigate + references/security-anomaly-expert.md
Observability/SRE framing (degradation, capacity, deployment regression)	Investigate + references/observability-anomaly-expert.md

When a question spans modes: Investigate → Explain → Troubleshoot. Don't blend mode logic — finish one before moving on.

Score Quick Reference

```
record_score
```
bands: >75 critical · 50–75 warning · 25–50 minor · <25 informational
```
multi_bucket_impact ≥ 3
```
→ sustained shift (not a transient spike)
```
initial_record_score >> record_score
```
→ renormalization (model saw worse anomalies later)
```
actual << typical
```
with
```
count
```
/
```
low_count
```
/
```
low_mean
```
→ absence/outage, not just low value
Low scores across many jobs > one high score — composite cross-job signal often beats single-detector severity

Full score definitions, renormalization mechanics, and
anomaly_score_explanation
components: references/score-reference.md.

Core concepts

Treat

.ml-anomalies-*

as three layers, accessed via

result_type

bucket
— bucket-level unusualness per
```
bucket_span
```
.
```
anomaly_score
```
is the aggregate across all detectors.

record
— finest-grained rows with

actual

typical

probability

record_score

anomaly_score_explanation

influencer
— entity contributions ranked within a bucket (
```
influencer_score
```
).

Read scores this way:

```
anomaly_score
```
/
```
record_score
```
= current normalized values (move as the model sees new extremes).

initial_anomaly_score

initial_record_score

= immutable snapshots from detection time.

Compare
```
actual
```
to
```
typical
```
; use
```
probability
```
for raw likelihood.

Map entities via

partition_field_value

by_field_value

over_field_value

Read
```
multi_bucket_impact
```
(-5 to +5) to separate single-bucket spikes from sustained trends.

Mode: Investigate — RCA

When: "what broke?", "which entity caused this?", cross-job correlation, blast radius, attack/cascade chains.

Tool chain

Phase	Tools
Discovery	`ad_get_available_metadata` , `ad_get_jobs` , `ad_discover_related_jobs` , `ad_discover_jobs_by_datafeed_index`
Timeline / scope	`ad_query_anomaly_timeline`
Cross-job / entities	`ad_rca_cross_job_entity_match` , `ad_rca_multi_job_entities` , `ad_rca_entity_profile`
Records / influencers	`ad_query_anomaly_records` , `ad_query_influencers`
RCA depth	`ad_rca_detector_fingerprint` , `ad_rca_correlation` , `ad_rca_blast_radius` , `ad_rca_score_reassessment`
Evidence / categories	`ad_get_job_datafeed_config` , `ad_rca_source_evidence` , `ad_get_categories` , `ad_search_log_category_examples`

Protocol

Follow the 14-step sequence in references/protocols/investigation.md. High level:

ad_get_available_metadata

→ pair

ad_discover_jobs_by_datafeed_index

with

ad_discover_related_jobs

→

ad_query_anomaly_timeline

→ rank with

ad_rca_multi_job_entities

(

min_job_count=2

) →

ad_rca_detector_fingerprint

→ drill with

ad_query_anomaly_records

ad_query_influencers

(low

min_score=25

) → profile with

ad_rca_entity_profile

→ order with

ad_rca_correlation

→ confirm with

ad_rca_source_evidence

. When

by_field_name == "mlcategory"

, compare with

ad_get_categories

+ paired

ad_search_log_category_examples

(baseline vs. anomaly window).

Finish with a written RCA: root cause entity · affected jobs · temporal progression · fault class (resource/network/application) · severity · recommended actions. Worked example: references/worked-example.md. Full ES|QL templates and parameters: references/investigate-anomaly-esql-tools.md.

Rules

Multi-job entities are prime suspects; single-job entities are usually victims. Use
```
min_job_count=2
```
.
Earliest anomaly timestamp wins — sort
```
ad_rca_correlation
```
by timestamp; first-appearing entity = origin.
multi_bucket_impact ≥ 3
= sustained behavioral shift, weight higher than transient spikes.
Never close an RCA without
ad_rca_source_evidence
— raw source documents are ground truth.
Use low
min_score
(25 or lower) for influencer queries — high thresholds miss correlated entities.

Mode: Explain — Score / model behavior

When: "why is my score 30/90?", "score dropped overnight", "what is renormalization?", "why wasn't this detected?".

Score types

Field	Scope	Meaning
`record_score`	Single record	Normalized severity after renormalization.
`initial_record_score`	Single record	Score at detection time. Gap vs `record_score` = renormalization drift.
`anomaly_score`	Bucket	Aggregate severity across all detectors in a bucket.
`influencer_score`	Entity × bucket	How anomalous a specific entity is in that bucket.

anomaly_score_explanation

components

Component	Effect	What it means
`anomaly_length`	↑ score	More consecutive anomalous buckets
`single_bucket_impact`	↑ score	Lower probability → higher impact
`multi_bucket_impact`	↑ score	Sustained pattern contribution
`anomaly_characteristics_impact`	↑ score	Mean shift vs. variance change
`high_variance_penalty`	↓ score	Noisy data → wide bounds → anomaly less surprising
`incomplete_bucket_penalty`	↓ score	Bucket has less data than expected (ingest lag, sparse data)

Why a score looks wrong

Unexpectedly low:
```
high_variance_penalty
```
, renormalization, <3 weeks training for weekly seasonality,
```
bucket_span
```
too large, wrong detector function (
```
mean
```
vs
```
high_mean
```
),
```
incomplete_bucket_penalty
```
, suppression by
```
custom_rules
```
.
Unexpectedly high: insufficient history (early training over-flags), high-cardinality split (too few points per entity),
```
use_null: true
```
on a sparse field.

Tool chain

Purpose	Tools
Records + explanation	`ad_query_anomaly_records` (exact `job_id_pattern` )
Renormalization drift	`ad_rca_score_reassessment` ( `score_drift = initial_record_score - record_score` )
Model bounds (visual)	`ad_get_model_plot` — actual outside `model_lower` / `model_upper` = anomaly
Forecast overlap	`ad_get_forecast_results`
Influencer attribution	`ad_query_influencers`
Config & detector	`ad_get_job_datafeed_config` — `bucket_span` , function, `custom_rules` , `use_null`
Categorization	`ad_get_categories`
Model snapshots	`ad_get_model_snapshots`
Structured diagnostic	`ad_wf_troubleshoot_anomaly_score` (full decision tree)

Decision tree (

ad_wf_troubleshoot_anomaly_score

)

```
ad_get_jobs
```
— ≥3 weeks data for weekly seasonality?
```
ad_ts_model_memory_health
```
—
```
memory_status
```
healthy?
```
ad_ts_delayed_data_annotations
```
— no incomplete buckets?

ad_query_anomaly_records

— compare

record_score

initial_record_score

ad_get_job_datafeed_config

—

bucket_span

, detector function,

custom_rules

use_null

```
ad_get_model_plot
```
— wide bounds →
```
high_variance_penalty
```
.
```
ad_rca_score_reassessment
```
— renormalization drift across history.
Explain
```
anomaly_score_explanation
```
factors.

Rules

Always show both
initial_record_score
and
record_score
— the gap is the renormalization story.
Explain renormalization before diagnosing config — score drift is the most common "score dropped" cause and needs no config change.
actual << typical
with
count
/
low_count
is an absence anomaly — distinguish outages from value spikes.
high_variance_penalty
and
incomplete_bucket_penalty
explain most "low score" surprises without remediation.
Weekly seasonality needs ≥3 weeks of training data — flag young jobs as the cause.

For detector function selection details, see references/anomaly-detection-functions.md.

Mode: Troubleshoot — Job ops

When: "missing documents", "datafeed stopped", "hard_limit", "results look wrong", lifecycle changes, calendars, CCS.

Common issues → fast paths

Issue	Fast path	Full decision tree
Missing docs / `query_delay` warning	`ad_ts_delayed_data_annotations` → `ad_ts_bucket_event_gaps` → `ad_ts_ingest_latency_estimate` → `ad_update_datafeed_query_delay`	`ad_wf_troubleshoot_query_delay`
Memory `soft_limit` / `hard_limit`	`ad_ts_model_memory_health` → `ad_wf_ts_field_cardinality` → `ad_estimate_memory_requirement` → `ad_update_model_memory_limit`	`ad_wf_troubleshoot_memory_limit`
Datafeed not running / job state	`ad_get_jobs` (state) → `ad_get_job_messages` → `ad_manage_datafeed`	—
CCS / `remote_cluster:` indices	`ad_ts_ccs_diagnostics`	—
Score sanity check	—	`ad_wf_troubleshoot_anomaly_score`

hard_limit
corrupts model state and causes downstream missing-doc false alarms (categorizer silently skips events for unknown categories). Fix memory before fixing
query_delay
.

Memory concepts

Field	Meaning
`model_bytes`	Current memory used
`peak_model_bytes`	High-water mark since job opened
`model_bytes_memory_limit`	Configured `model_memory_limit`
`memory_status`	`ok` / `soft_limit` (pruning) / `hard_limit` (critical)
`total_by_field_count > 100k`	`by_field` cardinality too high — dominant driver
`total_partition_field_count > 10k`	Partition explosion
`total_category_count > 10k`	Too many distinct log patterns

Prefer ad_estimate_memory_requirement
(samples cardinality from source, calls Estimate Model Memory API) over heuristics like

peak_model_bytes * 1.3

— the heuristic ignores pure influencer and categorization memory.

Datafeed & timing concepts

query_delay
— how far behind real time the datafeed queries. Too small → missing docs; too large → slower alerts. Set to P95 ingest latency + buffer (default
```
60s
```
–
```
120s
```
).
delayed_data_check_config
— how aggressively the datafeed checks for late data.
bucket_span
— analysis interval. Align with data granularity and detection window.

frequency
— defaults to

min(query_delay, bucket_span / 2)

Lifecycle for config changes (memory limit, query_delay)

Stop datafeed:
```
ad_manage_datafeed
```
(
```
action=_stop
```
)
Close job

Update config:

ad_update_model_memory_limit

ad_update_datafeed_query_delay

ad_update_delayed_data_check_config

Open job:
```
ad_open_job
```
Start datafeed:
```
ad_manage_datafeed
```
(
```
action=_start
```
)

Recover a corrupted period without resetting the whole model:

ad_revert_model_snapshot

Tool surface

Category	Tools
Permissions / metadata	`ad_validate_ml_tool_permissions` , `ad_get_available_metadata` , `ad_get_jobs`
Job + datafeed state	`ad_get_job_datafeed_config` , `ad_get_job_messages` , `ad_manage_datafeed` , `ad_preview_datafeed_with_latency`
Timing / missing docs	`ad_ts_delayed_data_annotations` , `ad_ts_bucket_event_gaps` , `ad_ts_ingest_latency_estimate` , `ad_update_datafeed_query_delay` , `ad_update_delayed_data_check_config` , `ad_wf_troubleshoot_query_delay`
Memory	`ad_ts_model_memory_health` , `ad_wf_ts_field_cardinality` , `ad_estimate_memory_requirement` , `ad_update_model_memory_limit` , `ad_wf_troubleshoot_memory_limit`
Model / lifecycle	`ad_get_model_snapshots` , `ad_revert_model_snapshot` , `ad_open_job` , `ad_create_job`
CCS	`ad_ts_ccs_diagnostics`
Calendars	`ad_get_calendar_events` , `ad_create_calendar_event`

Full parameter tables, ES|QL templates, and REST step lists: references/troubleshoot-anomaly-tool-reference.md.

Rules

ad_validate_ml_tool_permissions
first — missing privileges produce misleading empty results.
Fix memory before
query_delay
—
```
hard_limit
```
corrupts state;
```
query_delay
```
fixes on a memory-limited job are wasted.
Stop the datafeed before updating it. Updating a running datafeed is rejected.
Close the job before updating memory limit. Sequence above.
Prefer workflow tools (
ad_wf_*
) over manually chaining diagnostics for complex decisions.
ad_preview_datafeed_with_latency
before starting — confirm the datafeed returns data after config changes.

Mode: Manage — Create / configure jobs

When: "set up a job", "create an ML detector", "monitor X over time", "detect rare/unusual/anomalous values".

4-step workflow

text

PUT  _ml/anomaly_detectors/<job_id>          # 1. Define job        (ad_create_job)
PUT  _ml/datafeeds/datafeed-<job_id>         # 2. Define datafeed   (ad_create_datafeed)
POST _ml/anomaly_detectors/<job_id>/_open    # 3a. Open job         (ad_open_job)
POST _ml/datafeeds/datafeed-<job_id>/_start  # 3b. Start datafeed   (ad_manage_datafeed action=_start)
GET  _ml/anomaly_detectors/<job_id>/results/records  # 4. Read results

Process

Build configs. Parse the user request into job + datafeed JSON with no null fields.

Apply smart defaults:

Field	Default	Override when
`bucket_span`	`"15m"`	User specifies a different span
`time_field`	`"@timestamp"`	User names a different timestamp field
`index`	`"logs-*"`	User specifies an index or pattern
`datafeed_query`	`{"match_all": {}}`	User mentions filters, processes, or time windows
`influencers`	by/over/partition fields from detectors	User adds extra influencer fields
`job_id`	Generated from user description	User provides an explicit ID
`query_delay`	`"60s"`	P95 ingest latency is higher

Choose detector function from user intent — full table in references/anomaly-detection-functions.md:
- "high CPU" / "unusually large" →
```
high_mean
```
  or
```
high_sum
```
- "rare logins" / "unusual values" →
```
rare
```
  (variants below)
- "too many requests" / "spike in count" →
```
high_count
```
```
rare
```
variants:
- Infrequent globally →
```
rare by_field_name: X
```
- Infrequent vs peers →
```
rare by_field_name: X over_field_name: Y
```
- Infrequent per segment →
```
rare by_field_name: X partition_field_name: Y
```
- Infrequent per segment vs peers →
```
rare by_field_name: X over_field_name: Y partition_field_name: Z
```
Validate.
```
platform.core.get_index_mapping
```
on the target index to verify field existence/types →
```
ad_validate_job_spec
```
. If errors, fix and re-validate (max 3 attempts).
Present and confirm. Show the complete job + datafeed bodies formatted as the exact API calls. Ask for approval once. If feedback, incorporate and re-present (up to 3 rounds).

Deploy. After confirmation:

ad_create_job

→

ad_create_datafeed

→

ad_open_job

→

ad_manage_datafeed

(

action=_start

). Report final

job_id

and

datafeed_id

For batch analysis on historical data, pass

start

and

end

to the datafeed start call.

Worked examples (rare-username, DNS exfil, large-downloads) with full JSON bodies and datafeed filters: references/job-creation-recipes.md.

Rules

Create job before datafeed. Datafeed references job by ID.
Open job before starting datafeed. Start on a closed job is rejected.
query_delay
= P95 ingest latency + buffer (60s–120s safe default).
Forecasts require non-population jobs —
```
over_field_name
```
jobs cannot be forecasted; warn before attempting.
by_field_name
vs
over_field_name
:
```
by
```
compares entity to its own history;
```
over
```
compares to peer group in the same bucket.
```
partition_field_name
```
= fully independent sub-model with its own normalization.
bucket_span
matches detection granularity — 15m for high-frequency, 1h for operational metrics, 1d for daily patterns. Larger smooths short spikes; smaller increases noise.

Registration (Kibana Agent Builder)

Requires Node.js 18+. Defaults to

elastic

changeme

when no credentials are supplied.

bash

cd skills/kibana/kibana-anomaly-detection

# tools → workflows → skills
node scripts/kibana-agent-builder.mjs all register --kibana-url http://localhost:5601

# HTTPS with self-signed cert
node scripts/kibana-agent-builder.mjs all register --kibana-url https://localhost:5601 --insecure

all register

runs

tools register

, then

workflows register

, then

skills register

. Kibana allows at most five

tool_ids

per skill; the script fills them by scanning

SKILL.md

for tool mentions (in document order), then appends ids from

references/kibana/tools/esql/*.json

until the cap (workflow-only tools omitted by default). If you run

skills register

alone, run

tools register

first so those ids exist.

Workflow tool exclusions and prefixes live in

scripts/agent_builder_constants.json

MCP API key permissions:

Kibana:
```
read_onechat
```
,
```
space_read
```

Index:

read

view_index_metadata

.ml-anomalies-*

.ml-annotations-*

.ml-notifications-*

.ml-config

For source evidence:
```
read
```
on source data indices

Tool inventory

ES|QL tool specs live under

references/kibana/tools/esql/*.json

; workflow definitions under

references/kibana/workflows/*.yaml

. Each Mode section above lists the tools it uses. Full surface: references/tools.md (ES|QL) and references/workflow-tools.md (workflows).

Key system indices

Index	Relevant content
`.ml-anomalies-*`	`record` , `bucket` , `influencer` , `model_plot` , `model_forecast` , `model_snapshot` , `category_definition` , `model_size_stats`
`.ml-config`	job/datafeed documents (visible even for never-run jobs)
`.ml-annotations-*`	delayed data ( `event == "delayed_data"` )
`.ml-notifications-*`	job messages ( `level` : info/warning/error)

Examples

RCA: "Something caused a spike in our error rate at 2pm — what broke?" → Investigate →

ad_get_available_metadata

→

ad_query_anomaly_timeline

→

ad_rca_cross_job_entity_match

→

ad_rca_multi_job_entities

→ RCA report.

Score drop: "My anomaly score went from 90 to 55 — did the model change?" → Explain →

ad_rca_score_reassessment

for drift → explain renormalization if

score_drift

is large.

Memory limit: "Job status shows

hard_limit

and results look wrong." → Troubleshoot →

ad_ts_model_memory_health

→

ad_wf_ts_field_cardinality

→

ad_estimate_memory_requirement

→

ad_update_model_memory_limit

(lifecycle: stop datafeed → close → update → open → start).

New job: "Detect unusual error rates per host on nginx access logs." → Manage →

high_count

detector with

by_field_name: "host.keyword"

→ validate → present → deploy.

Multi-mode: "We had an incident last night, scores were high but now low — is the job healthy?" → Investigate the incident → Explain the score drift → Troubleshoot if

hard_limit

or delayed data is suspected.

Guidelines

Pick a mode first. Don't blend RCA logic with score-explanation logic in one response.
ad_validate_ml_tool_permissions
first on empty results — privileges are the most common false-negative cause.
Score bands are absolute thresholds:
```
>75
```
critical,
```
50–75
```
warning,
```
25–50
```
minor,
```
<25
```
informational.
Multi-job entities are prime suspects. Use
```
min_job_count=2
```
in
```
ad_rca_multi_job_entities
```
.
Show
initial_record_score
alongside
record_score
— the gap tells the renormalization story.
Fix memory before
query_delay
.
```
hard_limit
```
invalidates downstream diagnostics.
Stop datafeed → close job → update config → open job → start datafeed for any config change to memory or query delay.
Confirm RCAs with
ad_rca_source_evidence
. Raw source documents are ground truth.

kibana-anomaly-detection

NPX Install

Tags

SKILL.md Content

Elastic ML Anomaly Detection

Platform

Mode Selector

Score Quick Reference

Core concepts

Mode: Investigate — RCA

Tool chain

Protocol

Rules

Mode: Explain — Score / model behavior

Score types

`anomaly_score_explanation`
components

Why a score looks wrong

Tool chain

Decision tree (
`ad_wf_troubleshoot_anomaly_score`
)

Rules

Mode: Troubleshoot — Job ops

Common issues → fast paths

Memory concepts

Datafeed & timing concepts

Lifecycle for config changes (memory limit, query_delay)

Tool surface

Rules

Mode: Manage — Create / configure jobs

4-step workflow

Process

Rules

Registration (Kibana Agent Builder)

Tool inventory

Key system indices

Examples

Guidelines

kibana-anomaly-detection

NPX Install

Tags

SKILL.md Content

Elastic ML Anomaly Detection

Platform

Mode Selector

Score Quick Reference

Core concepts

Mode: Investigate — RCA

Tool chain

Protocol

Rules

Mode: Explain — Score / model behavior

Score types

anomaly_score_explanation components

Why a score looks wrong

Tool chain

Decision tree (ad_wf_troubleshoot_anomaly_score)

Rules

Mode: Troubleshoot — Job ops

Common issues → fast paths

Memory concepts

Datafeed & timing concepts

Lifecycle for config changes (memory limit, query_delay)

Tool surface

Rules

Mode: Manage — Create / configure jobs

4-step workflow

Process

Rules

Registration (Kibana Agent Builder)

Tool inventory

Key system indices

Examples

Guidelines

`anomaly_score_explanation`
components

Decision tree (
`ad_wf_troubleshoot_anomaly_score`
)