Frontend Observability Skill

Monitor web and mobile frontends using Real User Monitoring (RUM) with DQL queries. This skill targets the new RUM experience only; do not use classic RUM data.

Overview

This skill helps you:

Monitor Core Web Vitals and frontend performance
Track user sessions, engagement, and behavior
Analyze errors and correlate with backend traces
Optimize mobile app startup and stability
Diagnose performance issues with detailed timing analysis

Data Sources:

Metrics:
```
timeseries
```
with
```
dt.frontend.*
```
(trends, alerting)
Events:
```
fetch user.events
```
(individual page views, requests, clicks, errors)
Sessions:
```
fetch user.sessions
```
(session-level aggregates: duration, bounce, counts)

Quick Reference

Common Metrics

```
dt.frontend.user_action.count
```
- User action volume
```
dt.frontend.user_action.duration
```
- User action duration
```
dt.frontend.request.count
```
- Request volume
```
dt.frontend.request.duration
```
- Request latency (ms)
```
dt.frontend.error.count
```
- Error counts

dt.frontend.session.active.estimated_count

- Active sessions

```
dt.frontend.user.active.estimated_count
```
- Unique users

dt.frontend.web.page.cumulative_layout_shift

- CLS metric

dt.frontend.web.navigation.dom_interactive

- DOM interactive time

```
dt.frontend.web.page.first_input_delay
```
- FID metric (legacy; prefer INP)

dt.frontend.web.page.largest_contentful_paint

- LCP metric

dt.frontend.web.page.interaction_to_next_paint

- INP metric

dt.frontend.web.navigation.load_event_end

- Load event end

dt.frontend.web.navigation.time_to_first_byte

- Time to first byte

Common Filters

```
frontend.name
```
- Filter by frontend name (e.g.
```
my-frontend
```
)
```
dt.rum.user_type
```
- Exclude synthetic monitoring
```
geo.country.iso_code
```
- Geographic filtering
```
device.type
```
- Mobile, desktop, tablet
```
browser.name
```
- Browser filtering

Common Timeseries Dimensions

Use these for

dt.frontend.*

timeseries splits and breakdowns:

```
frontend.name
```
- Frontend name
```
geo.country.iso_code
```
```
device.type
```
```
browser.name
```
```
os.name
```
```
user_type
```
-
```
real_user
```
,
```
synthetic
```
,
```
robot
```

dql

fetch user.events, from: now() - 2h
| filter characteristics.has_page_summary == true
| summarize page_views = count(), by: {frontend.name}
| sort page_views desc

Event Characteristics

```
characteristics.has_page_summary
```
- Page views (web)
```
characteristics.has_view_summary
```
- Views (mobile)
```
characteristics.has_navigation
```
- Navigation events
```
characteristics.has_user_interaction
```
- Clicks, forms, etc.
```
characteristics.has_request
```
- Network request events
```
characteristics.has_error
```
- Error events
```
characteristics.has_crash
```
- Mobile crashes
```
characteristics.has_long_task
```
- Long JavaScript tasks
```
characteristics.has_csp_violation
```
- CSP violations

Full event model: https://docs.dynatrace.com/docs/semantic-dictionary/model/rum/user-events

Session Data (

user.sessions

)

user.sessions

contains session-level aggregates produced by the session aggregation service from

user.events

. Field names differ from
user.events
— sessions use underscores where events use dots.

Session identity and context:

```
dt.rum.session.id
```
— Session ID (NOT
```
dt.rum.session_id
```
)
```
dt.rum.instance.id
```
— Instance ID
```
frontend.name
```
- array of frontends involved in session
```
dt.rum.application.type
```
—
```
web
```
or
```
mobile
```
```
dt.rum.user_type
```
—
```
real_user
```
,
```
synthetic
```
, or
```
robot
```

Session aggregates (underscore naming — NOT dot):

Field	Description	⚠️ NOT this
`navigation_count`	Number of navigations	~~`navigation.count`~~
`user_interaction_count`	Clicks, form submissions	~~`user_interaction.count`~~
`user_action_count`	User actions	~~`user_action.count`~~
`request_count`	XHR/fetch requests	~~`request.count`~~
`event_count`	Total events in session	~~`event.count`~~
`page_summary_count`	Page views (web)	~~`page_summary.count`~~
`view_summary_count`	Views (mobile/SPA)	~~`view_summary.count`~~

Error fields (dot naming — same as events):

error.count

error.exception_count

error.http_4xx_count

error.http_5xx_count

error.anr_count

error.csp_violation_count

error.has_crash

Session lifecycle:

```
start_time
```
,
```
end_time
```
,
```
duration
```
(nanoseconds)

end_reason

—

timeout

synthetic_execution_finished

, etc.

```
characteristics.is_bounce
```
— Boolean bounce flag
```
characteristics.has_replay
```
— Session replay available

User identity:

```
dt.rum.user_tag
```
— User identifier (typically email, username or customerId), set via
```
dtrum.identifyUser()
```
API call in the instrumented frontend. Not always populated — only present when the frontend explicitly calls
```
identifyUser()
```
.
When
```
dt.rum.user_tag
```
is empty,
```
dt.rum.instance.id
```
is often the only user differentiator. The value is a random ID assigned by the RUM agent on the client side, so it is not personally identifiable but can be used to distinguish unique users when
```
user_tag
```
is not set. On web this is based on a persistent cookie, so it can be deleted by the user.
The user tag is a session-level field — query it from
```
user.sessions
```
, not
```
user.events
```
(where it may be empty even if the session has one).

Client/device context:

browser.name

browser.version

device.type

os.name

```
geo.country.iso_code
```
,
```
client.ip
```
,
```
client.isp
```

Synthetic-only fields:

dt.entity.synthetic_test

dt.entity.synthetic_location

dt.entity.synthetic_test_step

Time window behavior:

```
fetch user.sessions, from: X, to: Y
```
only returns sessions that started in
```
[X, Y]
```
— NOT sessions that were merely active during that window.
Sessions can last 8h+ (the aggregation service waits 30+ minutes of inactivity before closing a session).
To find all sessions active during a time window, extend the lookback by at least 8 hours: e.g., to cover events from the last 24h, query
```
fetch user.sessions, from: now() - 32h
```
.
This matters for correlation queries (e.g., matching
```
user.events
```
to
```
user.sessions
```
by session ID) — a narrow
```
user.sessions
```
window will miss long-running sessions and produce false "orphans."

Session creation delay:

The session aggregation service waits for ~30+ minutes of inactivity before closing a session and writing the
```
user.sessions
```
record.
This means recent events (last ~1 hour) will not yet have a matching
user.sessions
entry — this is normal, not a data gap.
When correlating
```
user.events
```
with
```
user.sessions
```
, exclude recent data (e.g., use
```
to: now() - 1h
```
) to avoid counting in-progress sessions as orphans.

Zombie sessions (events without a
user.sessions
record):

Not every
```
dt.rum.session.id
```
in
```
user.events
```
will have a corresponding
```
user.sessions
```
record. The session aggregation service intentionally skips zombie sessions — sessions with no real user activity (zero navigations and zero user interactions).
Zombie sessions contain only background, machine-driven activity (e.g., automatic XHR requests, heartbeats) with no page views or clicks. Serializing them would add no value to users.

When correlating

user.events

with

user.sessions

, expect a large number of unmatched session IDs. This is by design, not a data gap. Filter to sessions with activity before diagnosing orphans:

dql

fetch user.events, from: now() - 2h, to: now() - 1h
| filter isNotNull(dt.rum.session.id)
| summarize navs = countIf(characteristics.has_navigation == true),
    interactions = countIf(characteristics.has_user_interaction == true),
    by: {dt.rum.session.id}
| filter navs > 0 or interactions > 0

Example — bounce rate and session quality:

dql

fetch user.sessions, from: now() - 24h
| filter dt.rum.user_type == "real_user"
| summarize
    total_sessions = count(),
    bounces = countIf(characteristics.is_bounce == true),
    zero_activity = countIf(toLong(navigation_count) == 0 and toLong(user_interaction_count) == 0),
    avg_duration_s = avg(toLong(duration)) / 1000000000
| fieldsAdd bounce_rate_pct = round((bounces * 100.0) / total_sessions, decimals: 1)

Performance Thresholds

LCP: Good <2.5s | Poor >4.0s
INP: Good <200ms | Poor >500ms
CLS: Good <0.1 | Poor >0.25
Cold Start: Good <3s | Poor >5s
Long Tasks: >50ms problematic, >250ms severe

Core Workflows

1. Web Performance Monitoring

Track Core Web Vitals, page performance, and request latency for SEO and UX optimization.

Primary Files:

```
references/WebVitals.md
```
- Core Web Vitals (LCP, INP, CLS)
```
references/performance-analysis.md
```
- Request and page performance

Common Queries:

All Core Web Vitals summary
Web Vitals by page/device
Request duration SLA monitoring
Page load performance trends

2. User Session & Behavior Analysis

Understand user engagement, navigation patterns, and session characteristics. Analyze button clicks, form interactions, and user journeys.

Data source choice:

Use
```
fetch user.sessions
```
for session-level analysis (bounce rate, session duration, session counts)
Use
```
fetch user.events
```
for event-level detail (individual clicks, navigation timing, specific pages)

Primary Files:

```
references/user-sessions.md
```
- Session tracking and user analytics
```
references/performance-analysis.md
```
- Navigation and engagement patterns

Common Queries:

Active sessions by frontend
Sessions by custom property
Bounce rate analysis (use
```
user.sessions
```
with
```
characteristics.is_bounce
```
)
Session quality (zero-activity sessions via
```
navigation_count
```
,
```
user_interaction_count
```
)

Click analysis on UI elements (use

user.events

with

characteristics.has_user_interaction

)

External referrers (traffic sources)

3. Error Tracking & Debugging

Monitor error rates, analyze exceptions, and correlate frontend issues with backend.

Primary Files:

```
references/error-tracking.md
```
- Error analysis and debugging
```
references/performance-analysis.md
```
- Trace correlation

Common Queries:

Error rate monitoring
JavaScript exceptions by type
Failed requests with backend traces
Request timing breakdown

4. Mobile Frontend Monitoring

Track mobile app performance, startup times, and crash analytics for iOS and Android. Analyze app version performance and device-specific issues.

Primary Files:

```
references/mobile-monitoring.md
```
- App starts, crashes, and mobile-specific metrics

Common Queries:

Cold start performance by app version (iOS, Android)
Warm start and hot start metrics
Crash rate by device model and OS version
ANR events (Android)
Native crash signals
App version comparison

5. Advanced Performance Optimization

Deep performance diagnostics including JavaScript profiling, main thread blocking, UI jank analysis, and geographic performance.

Primary Files:

```
references/performance-analysis.md
```
- Advanced diagnostics and long tasks

Common Queries:

Long JavaScript tasks blocking main thread
UI jank and rendering delays
Tasks >50ms impacting responsiveness
Third-party long tasks (iframes)
Single-page app performance issues
Geographic performance distribution
Performance degradation detection

Best Practices

Use metrics for trends, events for debugging
- Metrics: Timeseries dashboards, alerting, capacity planning
- Events: Root cause analysis, detailed diagnostics
Filter by frontend in multi-app environments
- Always use
```
frontend.name
```
  for clarity
Match interval to time range
- 5m intervals for hours, 1h for days, 1d for weeks
Exclude synthetic traffic when analyzing real users
- Filter
```
dt.rum.user_type
```
  to focus on genuine behavior
Combine metrics with events for complete insights
- Start with metric trends, drill into events for details
Extend
user.sessions
time window for correlation queries
- ```
user.sessions
```
  only returns sessions that started in the query window
- Sessions can last 8h+, so extend lookback by at least 8h when joining with
```
user.events
```

Slow Page Load Playbook

Start by segmenting the problem by page, browser, geo location, and

dt.rum.user_type

Heuristics:

High TTFB -> slow backend
High LCP with normal TTFB -> render bottleneck
High CLS -> layout shifts (late-loading content, ads, fonts)
Long tasks dominate -> JavaScript execution bottlenecks (heavy frameworks, large bundles)

Backend latency (high TTFB)

dql

fetch user.events
| filter frontend.name == "my-frontend" and characteristics.has_request == true
| filter page.url.path == "/checkout"
| summarize avg_ttfb = avg(request.time_to_first_byte), avg_duration = avg(duration)

If TTFB is high, analyze backend spans by correlating frontend events with backend traces using

dt.rum.trace_id

Heavy JavaScript execution (long tasks)

Long tasks by page:

dql

fetch user.events, from: now() - 2h
| filter characteristics.has_long_task == true
| summarize
   long_task_count = count(),
   total_blocking_time = sum(duration),
   by: {frontend.name, page.url.path}
| sort total_blocking_time desc
| limit 20

Long tasks by script source:

dql

fetch user.events, from: now() - 2h
| filter frontend.name == "my-frontend"
| filter characteristics.has_long_task == true
| summarize
   long_task_count = count(),
   total_blocking_time = sum(duration),
   by: {long_task.attribution.container_src}
| sort total_blocking_time desc
| limit 20

Large JavaScript bundles

dql

fetch user.events
| filter frontend.name == "my-frontend"
| filter characteristics.has_request
| filter endsWith(url.full, ".js")
| summarize dls = max(performance.decoded_body_size), by: url.full
| sort dls desc
| limit 20

Large resources

dql

fetch user.events
| filter frontend.name == "my-frontend"
| filter characteristics.has_request
| summarize dls = max(performance.decoded_body_size), by: url.full
| sort dls desc
| limit 20

Cache effectiveness

dql

fetch user.events, from: now() - 2h
| filter frontend.name == "my-frontend"
| filter characteristics.has_request == true
| fieldsAdd cache_status = if(
   performance.incomplete_reason == "local_cache" or performance.transfer_size == 0 and
   (performance.encoded_body_size > 0 or performance.decoded_body_size > 0),
   "cached",
   else: if(performance.transfer_size > 0, "network", else: "uncached")
  )
| summarize
   request_count = count(),
   avg_duration = avg(duration),
   by: {url.domain, cache_status}

Compression waste

dql

fetch user.events, from: now() - 2h
| filter characteristics.has_request == true
| filter isNotNull(performance.encoded_body_size) and isNotNull(performance.decoded_body_size)
| filter performance.encoded_body_size > 0
| fieldsAdd
   expansion_ratio = performance.decoded_body_size / performance.encoded_body_size,
   wasted_bytes = performance.decoded_body_size - performance.encoded_body_size
| summarize
   requests = count(),
   avg_expansion_ratio = avg(expansion_ratio),
   total_wasted_bytes = sum(wasted_bytes),
   by: {request.url.host, request.url.path}
| sort total_wasted_bytes desc
| limit 50

Network issues

Compare by location and domain when TTFB is high but backend performance is good:

dql

fetch user.events, from: now() - 2h
| filter characteristics.has_request == true
| summarize
   request_count = count(),
   avg_duration = avg(duration),
   p75_duration = percentile(duration, 75),
   p95_duration = percentile(duration, 95),
   by: {geo.country.iso_code, request.url.domain}
| sort p95_duration desc
| limit 50

Analyze DNS time:

dql

fetch user.events, from: now() - 2h
| filter characteristics.has_request == true
| filter isNotNull(performance.domain_lookup_start) and isNotNull(performance.domain_lookup_end)
| fieldsAdd dns_ms = performance.domain_lookup_end - performance.domain_lookup_start
| summarize
   request_count = count(),
   avg_dns_ms = avg(dns_ms),
   p75_dns_ms = percentile(dns_ms, 75),
   p95_dns_ms = percentile(dns_ms, 95),
   by: {request.url.domain}
| sort p95_dns_ms desc
| limit 50

Analyze by protocol (http/1.1, h2, h3):

dql

fetch user.events
| filter characteristics.has_request
| summarize cnt = count(), by: {url.domain, performance.next_hop_protocol}
| sort cnt desc
| limit 50

Third-party dependencies

Analyze request performance by domain:

dql

fetch user.events, from: now() - 2h
| filter characteristics.has_request == true
| summarize
   request_count = count(),
   avg_duration = avg(duration),
   p75_duration = percentile(duration, 75),
   p95_duration = percentile(duration, 95),
   by: {request.url.domain}
| sort p95_duration desc
| limit 50

Troubleshooting

Handling Zero Results

When queries return no data, follow this diagnostic workflow:

Validate Timeframe
- Check if timeframe is appropriate for the data type
- RUM data may have delay (1-2 minutes for recent events)
- Verify timeframe syntax:
```
now()-1h to now()
```
  or similar
- Try expanding timeframe:
```
now()-24h
```
  for initial exploration
Verify frontend Configuration
- Confirm frontend is instrumented and sending RUM data
- Check
```
frontend.name
```
  filter is correct
- Test without frontend filter to see if any RUM data exists
- Verify frontend name matches the environment
Check Data Availability
- Run basic query:
```
fetch user.events | limit 1
```
- If no events exist, RUM may not be configured
- Check if timeframe predates frontend deployment
- Verify user has access to the environment
Review Query Syntax
- Validate filters aren't too restrictive
- Check for typos in field names or metric names
- Test query incrementally: start simple, add filters gradually
- Verify characteristics filters match event types

When to Ask User for Clarification:

No RUM data exists in environment → "Is RUM configured for this frontend?"
Timeframe unclear → "What time period should I analyze?"
Expected data missing → "Has this frontend sent data recently?"

Handling Anomalous Results

When query results seem unexpected or suspicious:

Unexpected High Values:

Metric spikes: Verify interval aggregation (avg vs. max vs. sum)
Session counts: Check for bot traffic or synthetic monitoring
Error rates: Confirm error definition matches expectations
Performance degradation: Look for deployment or infrastructure changes

Unexpected Low Values:

Missing sessions: Verify
```
dt.rum.user_type
```
filter isn't excluding real users
Low request counts: Check if frontend filter is too narrow
Few errors: Confirm error characteristics filter is correct
Missing mobile data: Verify platform-specific fields exist

Inconsistent Data:

Metrics vs. Events mismatch: Different aggregation methods are expected
Geographic anomalies: Check timezone assumptions
Device distribution skew: May reflect actual user base
Version mismatches: Verify app version filtering logic

Decision Tree: Ask vs. Investigate

Query returns unexpected results
│
├─ Is this a zero-result scenario?
│  ├─ YES → Follow "Handling Zero Results" workflow
│  └─ NO → Continue
│
├─ Can I validate the result independently?
│  ├─ YES → Run validation query
│  │        ├─ Validation confirms result → Report findings
│  │        └─ Validation contradicts → Investigate further
│  └─ NO → Continue
│
├─ Is the anomaly clearly explained by data?
│  ├─ YES → Report with explanation
│  └─ NO → Continue
│
├─ Do I need domain knowledge to interpret?
│  ├─ YES → Ask user for context
│  │        Example: "The error rate is 15%. Is this expected for your frontend?"
│  └─ NO → Continue
│
└─ Is the issue ambiguous or requires clarification?
   ├─ YES → Ask specific question with data context
   │        Example: "I see two frontends named 'web-app'. Which frontend name should I use?"
   └─ NO → Investigate and report findings with caveats

Common Investigation Steps

For Performance Issues:

Compare to baseline: Query same metric for previous week
Segment by dimension: Break down by device, browser, geography
Check for outliers: Use percentiles (p50, p95, p99) vs. averages
Correlate with deployments: Filter by app version or time windows

For Data Availability Issues:

Start broad: Query all RUM data without filters
Add filters incrementally: Isolate which filter eliminates data
Check related metrics: If events missing, try timeseries
Validate entity relationships: Confirm frontend-to-service links

For Unexpected Patterns:

Expand timeframe: Look for historical context
Cross-reference data sources: Compare events and metrics
Check sampling: Verify no sampling is affecting results
Consider external factors: Holidays, outages, traffic changes

Red Flags: When to Stop and Ask

Always ask the user when:

❌ No RUM data exists anywhere in the environment
❌ Multiple frontends match the user's description
❌ Results contradict user's stated expectations explicitly
❌ Data suggests monitoring is misconfigured
❌ Query requires business context (e.g., "acceptable error rate")
❌ Timeframe is ambiguous and affects interpretation significantly

Example clarifying questions:

"I found two frontends named 'checkout'. Which one:
```
checkout-web
```
or
```
checkout-mobile
```
?"
"The query returns 0 results for the past hour. Should I expand the timeframe, or do you expect real-time data?"
"The average LCP is 8 seconds, which exceeds the 4-second threshold. Is this frontend known to have performance issues?"
"I see only synthetic traffic. Should I include
```
dt.rum.user_type='REAL_USER'
```
to focus on real users?"

When to Use This Skill

Use frontend-observability skill when:

Monitoring web or mobile frontend performance
Analyzing Core Web Vitals for SEO
Tracking user sessions, engagement, or behavior
Analyzing click events and button interactions
Debugging frontend errors or slow requests
Correlating frontend issues with backend traces
Optimizing mobile app startup or crash rates (iOS, Android)
Analyzing app version performance
Diagnosing UI jank and main thread blocking
Analyzing security compliance (CSP violations)
Profiling JavaScript performance (long tasks)

Do NOT use for:

Backend service monitoring (use services skill)
Infrastructure metrics (use infrastructure skill)
Log analysis (use logs skill)
Business process monitoring (use business-events skill)

Progressive Disclosure

Always Available

FrontendBasics.md - RUM fundamentals and quick reference

Loaded by Workflow

Web Performance: WebVitals.md, performance-analysis.md
User Behavior: user-sessions.md, performance-analysis.md
Error Analysis: error-tracking.md, performance-analysis.md
Mobile Apps: mobile-monitoring.md

Load on Explicit Request

Advanced diagnostics (long tasks, user actions)
Security compliance (CSP violations, visibility tracking)
Specialized mobile features (platform-specific phases)

Reference Files

Core Reference Documents

```
references/WebVitals.md
```
- Core Web Vitals monitoring
```
references/user-sessions.md
```
- Session and user analytics
```
references/error-tracking.md
```
- Error analysis and debugging
```
references/mobile-monitoring.md
```
- Mobile app performance and crashes
```
references/performance-analysis.md
```
- Advanced performance diagnostics

dt-obs-frontends

NPX Install

Tags

SKILL.md Content

Frontend Observability Skill

Overview

Quick Reference

Common Metrics

Common Filters

Common Timeseries Dimensions

Event Characteristics

Session Data (user.sessions)

Performance Thresholds

Core Workflows

1. Web Performance Monitoring

2. User Session & Behavior Analysis

3. Error Tracking & Debugging

4. Mobile Frontend Monitoring

5. Advanced Performance Optimization

Best Practices

Slow Page Load Playbook

Backend latency (high TTFB)

Heavy JavaScript execution (long tasks)

Large JavaScript bundles

Large resources

Cache effectiveness

Compression waste

Network issues

Third-party dependencies

Troubleshooting

Handling Zero Results

Handling Anomalous Results

Decision Tree: Ask vs. Investigate

Common Investigation Steps

Red Flags: When to Stop and Ask

When to Use This Skill

Progressive Disclosure

Always Available

Loaded by Workflow

Load on Explicit Request

Reference Files

Core Reference Documents

Session Data (
`user.sessions`
)