dt-obs-frontends
Real User Monitoring (RUM), Web Vitals, user sessions, mobile crashes, page performance, user interactions, and frontend errors. Query web and mobile frontend telemetry.
What it does
Frontend Observability Skill
Monitor web and mobile frontends using Real User Monitoring (RUM) with DQL queries. This skill targets the new RUM experience only; do not use classic RUM data.
Overview
This skill helps you:
- Monitor Core Web Vitals and frontend performance
- Track user sessions, engagement, and behavior
- Analyze errors and correlate with backend traces
- Optimize mobile app startup and stability
- Diagnose performance issues with detailed timing analysis
Data Sources:
- Metrics:
timeserieswithdt.frontend.*(trends, alerting) - Events:
fetch user.events(individual page views, requests, clicks, errors) - Sessions:
fetch user.sessions(session-level aggregates: duration, bounce, counts)
Quick Reference
Common Metrics
dt.frontend.user_action.count- User action volumedt.frontend.user_action.duration- User action durationdt.frontend.request.count- Request volumedt.frontend.request.duration- Request latency (ms)dt.frontend.error.count- Error countsdt.frontend.session.active.estimated_count- Active sessionsdt.frontend.user.active.estimated_count- Unique usersdt.frontend.web.page.cumulative_layout_shift- CLS metricdt.frontend.web.navigation.dom_interactive- DOM interactive timedt.frontend.web.page.first_input_delay- FID metric (legacy; prefer INP)dt.frontend.web.page.largest_contentful_paint- LCP metricdt.frontend.web.page.interaction_to_next_paint- INP metricdt.frontend.web.navigation.load_event_end- Load event enddt.frontend.web.navigation.time_to_first_byte- Time to first byte
Common Filters
frontend.name- Filter by frontend name (e.g.my-frontend)dt.rum.user_type- Exclude synthetic monitoringgeo.country.iso_code- Geographic filteringdevice.type- Mobile, desktop, tabletbrowser.name- Browser filtering
Common Timeseries Dimensions
Use these for dt.frontend.* timeseries splits and breakdowns:
frontend.name- Frontend namegeo.country.iso_codedevice.typebrowser.nameos.nameuser_type-real_user,synthetic,robot
fetch user.events, from: now() - 2h
| filter characteristics.has_page_summary == true
| summarize page_views = count(), by: {frontend.name}
| sort page_views desc
Event Characteristics
characteristics.has_page_summary- Page views (web)characteristics.has_view_summary- Views (mobile)characteristics.has_navigation- Navigation eventscharacteristics.has_user_interaction- Clicks, forms, etc.characteristics.has_request- Network request eventscharacteristics.has_error- Error eventscharacteristics.has_crash- Mobile crashescharacteristics.has_long_task- Long JavaScript taskscharacteristics.has_csp_violation- CSP violations
Full event model: https://docs.dynatrace.com/docs/semantic-dictionary/model/rum/user-events
Session Data (user.sessions)
user.sessions contains session-level aggregates produced by the session aggregation service from user.events. Field names differ from user.events — sessions use underscores where events use dots.
Session identity and context:
dt.rum.session.id— Session ID (NOTdt.rum.session_id)dt.rum.instance.id— Instance IDfrontend.name- array of frontends involved in sessiondt.rum.application.type—webormobiledt.rum.user_type—real_user,synthetic, orrobot
Session aggregates (underscore naming — NOT dot):
| Field | Description | ⚠️ NOT this |
|---|---|---|
navigation_count | Number of navigations | navigation.count |
user_interaction_count | Clicks, form submissions | user_interaction.count |
user_action_count | User actions | user_action.count |
request_count | XHR/fetch requests | request.count |
event_count | Total events in session | event.count |
page_summary_count | Page views (web) | page_summary.count |
view_summary_count | Views (mobile/SPA) | view_summary.count |
Error fields (dot naming — same as events):
error.count,error.exception_count,error.http_4xx_count,error.http_5xx_counterror.anr_count,error.csp_violation_count,error.has_crash
Session lifecycle:
start_time,end_time,duration(nanoseconds)end_reason—timeout,synthetic_execution_finished, etc.characteristics.is_bounce— Boolean bounce flagcharacteristics.has_replay— Session replay available
User identity:
dt.rum.user_tag— User identifier (typically email, username or customerId), set viadtrum.identifyUser()API call in the instrumented frontend. Not always populated — only present when the frontend explicitly callsidentifyUser().- When
dt.rum.user_tagis empty,dt.rum.instance.idis often the only user differentiator. The value is a random ID assigned by the RUM agent on the client side, so it is not personally identifiable but can be used to distinguish unique users whenuser_tagis not set. On web this is based on a persistent cookie, so it can be deleted by the user. - The user tag is a session-level field — query it from
user.sessions, notuser.events(where it may be empty even if the session has one).
Client/device context:
browser.name,browser.version,device.type,os.namegeo.country.iso_code,client.ip,client.isp
Synthetic-only fields:
dt.entity.synthetic_test,dt.entity.synthetic_location,dt.entity.synthetic_test_step
Time window behavior:
fetch user.sessions, from: X, to: Yonly returns sessions that started in[X, Y]— NOT sessions that were merely active during that window.- Sessions can last 8h+ (the aggregation service waits 30+ minutes of inactivity before closing a session).
- To find all sessions active during a time window, extend the lookback by at least 8 hours: e.g., to cover events from the last 24h, query
fetch user.sessions, from: now() - 32h. - This matters for correlation queries (e.g., matching
user.eventstouser.sessionsby session ID) — a narrowuser.sessionswindow will miss long-running sessions and produce false "orphans."
Session creation delay:
- The session aggregation service waits for ~30+ minutes of inactivity before closing a session and writing the
user.sessionsrecord. - This means recent events (last ~1 hour) will not yet have a matching
user.sessionsentry — this is normal, not a data gap. - When correlating
user.eventswithuser.sessions, exclude recent data (e.g., useto: now() - 1h) to avoid counting in-progress sessions as orphans.
Zombie sessions (events without a user.sessions record):
- Not every
dt.rum.session.idinuser.eventswill have a correspondinguser.sessionsrecord. The session aggregation service intentionally skips zombie sessions — sessions with no real user activity (zero navigations and zero user interactions). - Zombie sessions contain only background, machine-driven activity (e.g., automatic XHR requests, heartbeats) with no page views or clicks. Serializing them would add no value to users.
- When correlating
user.eventswithuser.sessions, expect a large number of unmatched session IDs. This is by design, not a data gap. Filter to sessions with activity before diagnosing orphans:fetch user.events, from: now() - 2h, to: now() - 1h | filter isNotNull(dt.rum.session.id) | summarize navs = countIf(characteristics.has_navigation == true), interactions = countIf(characteristics.has_user_interaction == true), by: {dt.rum.session.id} | filter navs > 0 or interactions > 0
Example — bounce rate and session quality:
fetch user.sessions, from: now() - 24h
| filter dt.rum.user_type == "real_user"
| summarize
total_sessions = count(),
bounces = countIf(characteristics.is_bounce == true),
zero_activity = countIf(toLong(navigation_count) == 0 and toLong(user_interaction_count) == 0),
avg_duration_s = avg(toLong(duration)) / 1000000000
| fieldsAdd bounce_rate_pct = round((bounces * 100.0) / total_sessions, decimals: 1)
Performance Thresholds
- LCP: Good <2.5s | Poor >4.0s
- INP: Good <200ms | Poor >500ms
- CLS: Good <0.1 | Poor >0.25
- Cold Start: Good <3s | Poor >5s
- Long Tasks: >50ms problematic, >250ms severe
Core Workflows
1. Web Performance Monitoring
Track Core Web Vitals, page performance, and request latency for SEO and UX optimization.
Primary Files:
references/WebVitals.md- Core Web Vitals (LCP, INP, CLS)references/performance-analysis.md- Request and page performance
Common Queries:
- All Core Web Vitals summary
- Web Vitals by page/device
- Request duration SLA monitoring
- Page load performance trends
2. User Session & Behavior Analysis
Understand user engagement, navigation patterns, and session characteristics. Analyze button clicks, form interactions, and user journeys.
Data source choice:
- Use
fetch user.sessionsfor session-level analysis (bounce rate, session duration, session counts) - Use
fetch user.eventsfor event-level detail (individual clicks, navigation timing, specific pages)
Primary Files:
references/user-sessions.md- Session tracking and user analyticsreferences/performance-analysis.md- Navigation and engagement patterns
Common Queries:
- Active sessions by frontend
- Sessions by custom property
- Bounce rate analysis (use
user.sessionswithcharacteristics.is_bounce) - Session quality (zero-activity sessions via
navigation_count,user_interaction_count) - Click analysis on UI elements (use
user.eventswithcharacteristics.has_user_interaction) - External referrers (traffic sources)
3. Error Tracking & Debugging
Monitor error rates, analyze exceptions, and correlate frontend issues with backend.
Primary Files:
references/error-tracking.md- Error analysis and debuggingreferences/performance-analysis.md- Trace correlation
Common Queries:
- Error rate monitoring
- JavaScript exceptions by type
- Failed requests with backend traces
- Request timing breakdown
4. Mobile Frontend Monitoring
Track mobile app performance, startup times, and crash analytics for iOS and Android. Analyze app version performance and device-specific issues.
Primary Files:
references/mobile-monitoring.md- App starts, crashes, and mobile-specific metrics
Common Queries:
- Cold start performance by app version (iOS, Android)
- Warm start and hot start metrics
- Crash rate by device model and OS version
- ANR events (Android)
- Native crash signals
- App version comparison
5. Advanced Performance Optimization
Deep performance diagnostics including JavaScript profiling, main thread blocking, UI jank analysis, and geographic performance.
Primary Files:
references/performance-analysis.md- Advanced diagnostics and long tasks
Common Queries:
- Long JavaScript tasks blocking main thread
- UI jank and rendering delays
- Tasks >50ms impacting responsiveness
- Third-party long tasks (iframes)
- Single-page app performance issues
- Geographic performance distribution
- Performance degradation detection
Best Practices
-
Use metrics for trends, events for debugging
- Metrics: Timeseries dashboards, alerting, capacity planning
- Events: Root cause analysis, detailed diagnostics
-
Filter by frontend in multi-app environments
- Always use
frontend.namefor clarity
- Always use
-
Match interval to time range
- 5m intervals for hours, 1h for days, 1d for weeks
-
Exclude synthetic traffic when analyzing real users
- Filter
dt.rum.user_typeto focus on genuine behavior
- Filter
-
Combine metrics with events for complete insights
- Start with metric trends, drill into events for details
-
Extend
user.sessionstime window for correlation queriesuser.sessionsonly returns sessions that started in the query window- Sessions can last 8h+, so extend lookback by at least 8h when joining with
user.events
Slow Page Load Playbook
Start by segmenting the problem by page, browser, geo location, and dt.rum.user_type.
Heuristics:
- High TTFB -> slow backend
- High LCP with normal TTFB -> render bottleneck
- High CLS -> layout shifts (late-loading content, ads, fonts)
- Long tasks dominate -> JavaScript execution bottlenecks (heavy frameworks, large bundles)
Backend latency (high TTFB)
fetch user.events
| filter frontend.name == "my-frontend" and characteristics.has_request == true
| filter page.url.path == "/checkout"
| summarize avg_ttfb = avg(request.time_to_first_byte), avg_duration = avg(duration)
If TTFB is high, analyze backend spans by correlating frontend events with backend traces using dt.rum.trace_id.
Heavy JavaScript execution (long tasks)
Long tasks by page:
fetch user.events, from: now() - 2h
| filter characteristics.has_long_task == true
| summarize
long_task_count = count(),
total_blocking_time = sum(duration),
by: {frontend.name, page.url.path}
| sort total_blocking_time desc
| limit 20
Long tasks by script source:
fetch user.events, from: now() - 2h
| filter frontend.name == "my-frontend"
| filter characteristics.has_long_task == true
| summarize
long_task_count = count(),
total_blocking_time = sum(duration),
by: {long_task.attribution.container_src}
| sort total_blocking_time desc
| limit 20
Large JavaScript bundles
fetch user.events
| filter frontend.name == "my-frontend"
| filter characteristics.has_request
| filter endsWith(url.full, ".js")
| summarize dls = max(performance.decoded_body_size), by: url.full
| sort dls desc
| limit 20
Large resources
fetch user.events
| filter frontend.name == "my-frontend"
| filter characteristics.has_request
| summarize dls = max(performance.decoded_body_size), by: url.full
| sort dls desc
| limit 20
Cache effectiveness
fetch user.events, from: now() - 2h
| filter frontend.name == "my-frontend"
| filter characteristics.has_request == true
| fieldsAdd cache_status = if(
performance.incomplete_reason == "local_cache" or performance.transfer_size == 0 and
(performance.encoded_body_size > 0 or performance.decoded_body_size > 0),
"cached",
else: if(performance.transfer_size > 0, "network", else: "uncached")
)
| summarize
request_count = count(),
avg_duration = avg(duration),
by: {url.domain, cache_status}
Compression waste
fetch user.events, from: now() - 2h
| filter characteristics.has_request == true
| filter isNotNull(performance.encoded_body_size) and isNotNull(performance.decoded_body_size)
| filter performance.encoded_body_size > 0
| fieldsAdd
expansion_ratio = performance.decoded_body_size / performance.encoded_body_size,
wasted_bytes = performance.decoded_body_size - performance.encoded_body_size
| summarize
requests = count(),
avg_expansion_ratio = avg(expansion_ratio),
total_wasted_bytes = sum(wasted_bytes),
by: {request.url.host, request.url.path}
| sort total_wasted_bytes desc
| limit 50
Network issues
Compare by location and domain when TTFB is high but backend performance is good:
fetch user.events, from: now() - 2h
| filter characteristics.has_request == true
| summarize
request_count = count(),
avg_duration = avg(duration),
p75_duration = percentile(duration, 75),
p95_duration = percentile(duration, 95),
by: {geo.country.iso_code, request.url.domain}
| sort p95_duration desc
| limit 50
Analyze DNS time:
fetch user.events, from: now() - 2h
| filter characteristics.has_request == true
| filter isNotNull(performance.domain_lookup_start) and isNotNull(performance.domain_lookup_end)
| fieldsAdd dns_ms = performance.domain_lookup_end - performance.domain_lookup_start
| summarize
request_count = count(),
avg_dns_ms = avg(dns_ms),
p75_dns_ms = percentile(dns_ms, 75),
p95_dns_ms = percentile(dns_ms, 95),
by: {request.url.domain}
| sort p95_dns_ms desc
| limit 50
Analyze by protocol (http/1.1, h2, h3):
fetch user.events
| filter characteristics.has_request
| summarize cnt = count(), by: {url.domain, performance.next_hop_protocol}
| sort cnt desc
| limit 50
Third-party dependencies
Analyze request performance by domain:
fetch user.events, from: now() - 2h
| filter characteristics.has_request == true
| summarize
request_count = count(),
avg_duration = avg(duration),
p75_duration = percentile(duration, 75),
p95_duration = percentile(duration, 95),
by: {request.url.domain}
| sort p95_duration desc
| limit 50
Troubleshooting
Handling Zero Results
When queries return no data, follow this diagnostic workflow:
-
Validate Timeframe
- Check if timeframe is appropriate for the data type
- RUM data may have delay (1-2 minutes for recent events)
- Verify timeframe syntax:
now()-1h to now()or similar - Try expanding timeframe:
now()-24hfor initial exploration
-
Verify frontend Configuration
- Confirm frontend is instrumented and sending RUM data
- Check
frontend.namefilter is correct - Test without frontend filter to see if any RUM data exists
- Verify frontend name matches the environment
-
Check Data Availability
- Run basic query:
fetch user.events | limit 1 - If no events exist, RUM may not be configured
- Check if timeframe predates frontend deployment
- Verify user has access to the environment
- Run basic query:
-
Review Query Syntax
- Validate filters aren't too restrictive
- Check for typos in field names or metric names
- Test query incrementally: start simple, add filters gradually
- Verify characteristics filters match event types
When to Ask User for Clarification:
- No RUM data exists in environment → "Is RUM configured for this frontend?"
- Timeframe unclear → "What time period should I analyze?"
- Expected data missing → "Has this frontend sent data recently?"
Handling Anomalous Results
When query results seem unexpected or suspicious:
Unexpected High Values:
- Metric spikes: Verify interval aggregation (avg vs. max vs. sum)
- Session counts: Check for bot traffic or synthetic monitoring
- Error rates: Confirm error definition matches expectations
- Performance degradation: Look for deployment or infrastructure changes
Unexpected Low Values:
- Missing sessions: Verify
dt.rum.user_typefilter isn't excluding real users - Low request counts: Check if frontend filter is too narrow
- Few errors: Confirm error characteristics filter is correct
- Missing mobile data: Verify platform-specific fields exist
Inconsistent Data:
- Metrics vs. Events mismatch: Different aggregation methods are expected
- Geographic anomalies: Check timezone assumptions
- Device distribution skew: May reflect actual user base
- Version mismatches: Verify app version filtering logic
Decision Tree: Ask vs. Investigate
Query returns unexpected results
│
├─ Is this a zero-result scenario?
│ ├─ YES → Follow "Handling Zero Results" workflow
│ └─ NO → Continue
│
├─ Can I validate the result independently?
│ ├─ YES → Run validation query
│ │ ├─ Validation confirms result → Report findings
│ │ └─ Validation contradicts → Investigate further
│ └─ NO → Continue
│
├─ Is the anomaly clearly explained by data?
│ ├─ YES → Report with explanation
│ └─ NO → Continue
│
├─ Do I need domain knowledge to interpret?
│ ├─ YES → Ask user for context
│ │ Example: "The error rate is 15%. Is this expected for your frontend?"
│ └─ NO → Continue
│
└─ Is the issue ambiguous or requires clarification?
├─ YES → Ask specific question with data context
│ Example: "I see two frontends named 'web-app'. Which frontend name should I use?"
└─ NO → Investigate and report findings with caveats
Common Investigation Steps
For Performance Issues:
- Compare to baseline: Query same metric for previous week
- Segment by dimension: Break down by device, browser, geography
- Check for outliers: Use percentiles (p50, p95, p99) vs. averages
- Correlate with deployments: Filter by app version or time windows
For Data Availability Issues:
- Start broad: Query all RUM data without filters
- Add filters incrementally: Isolate which filter eliminates data
- Check related metrics: If events missing, try timeseries
- Validate entity relationships: Confirm frontend-to-service links
For Unexpected Patterns:
- Expand timeframe: Look for historical context
- Cross-reference data sources: Compare events and metrics
- Check sampling: Verify no sampling is affecting results
- Consider external factors: Holidays, outages, traffic changes
Red Flags: When to Stop and Ask
Always ask the user when:
- ❌ No RUM data exists anywhere in the environment
- ❌ Multiple frontends match the user's description
- ❌ Results contradict user's stated expectations explicitly
- ❌ Data suggests monitoring is misconfigured
- ❌ Query requires business context (e.g., "acceptable error rate")
- ❌ Timeframe is ambiguous and affects interpretation significantly
Example clarifying questions:
- "I found two frontends named 'checkout'. Which one:
checkout-weborcheckout-mobile?" - "The query returns 0 results for the past hour. Should I expand the timeframe, or do you expect real-time data?"
- "The average LCP is 8 seconds, which exceeds the 4-second threshold. Is this frontend known to have performance issues?"
- "I see only synthetic traffic. Should I include
dt.rum.user_type='REAL_USER'to focus on real users?"
When to Use This Skill
Use frontend-observability skill when:
- Monitoring web or mobile frontend performance
- Analyzing Core Web Vitals for SEO
- Tracking user sessions, engagement, or behavior
- Analyzing click events and button interactions
- Debugging frontend errors or slow requests
- Correlating frontend issues with backend traces
- Optimizing mobile app startup or crash rates (iOS, Android)
- Analyzing app version performance
- Diagnosing UI jank and main thread blocking
- Analyzing security compliance (CSP violations)
- Profiling JavaScript performance (long tasks)
Do NOT use for:
- Backend service monitoring (use services skill)
- Infrastructure metrics (use infrastructure skill)
- Log analysis (use logs skill)
- Business process monitoring (use business-events skill)
Progressive Disclosure
Always Available
- FrontendBasics.md - RUM fundamentals and quick reference
Loaded by Workflow
- Web Performance: WebVitals.md, performance-analysis.md
- User Behavior: user-sessions.md, performance-analysis.md
- Error Analysis: error-tracking.md, performance-analysis.md
- Mobile Apps: mobile-monitoring.md
Load on Explicit Request
- Advanced diagnostics (long tasks, user actions)
- Security compliance (CSP violations, visibility tracking)
- Specialized mobile features (platform-specific phases)
Reference Files
Core Reference Documents
references/WebVitals.md- Core Web Vitals monitoringreferences/user-sessions.md- Session and user analyticsreferences/error-tracking.md- Error analysis and debuggingreferences/mobile-monitoring.md- Mobile app performance and crashesreferences/performance-analysis.md- Advanced performance diagnostics
Capabilities
Install
Quality
deterministic score 0.49 from registry signals: · indexed on github topic:agent-skills · 78 github stars · SKILL.md body (23,606 chars)