Agent Skill · Dynatrace

dt-alerting

End-to-end Dynatrace alerting lifecycle — anomaly detector setup and model selection (static threshold, adaptive baseline, seasonal baseline), alert event storage in Grail, problem grouping and denoising by root cause analysis, and workflow-based notification routing (email, Slack, ServiceNow, webhook). Use when configuring alerting, choosing between detector types, querying alert event history, understanding why alerts merged into a problem, or setting up problem-triggered notifications.

View SKILL.md on GitHub → Source repository Provider profile

Provider: Dynatrace Path in repo: skills/dt-alerting/SKILL.md

Skill body

dt-alerting

Configure and understand the full alerting lifecycle in Dynatrace — from anomaly detector setup through Grail event storage, problem grouping, and workflow notification delivery.

The Alerting Lifecycle

┌─────────────────────────────────────────────────────────────────────┐
│  Alert Sources — five categories, each fires a DAVIS_EVENT          │
│  ─────────────────────────────────────────────────────────────────  │
│  1. DQL-based  · Grail-scheduled server-side detector               │
│  2. Edge       · OneAgent on monitored host or process              │
│  3. Pipeline   · OpenPipeline ingest-stream filter matcher          │
│  4. Synthetic  · Worldwide synthetic checker node                   │
│  5. External   · Events API, Workflow, or OneAgent local ingest     │
└──────────────────────────────┬──────────────────────────────────────┘
                               │ DAVIS_EVENT created per trigger per entity
                               ▼
             ┌─────────────────────────────────────┐
             │  Event stored in Grail              │  Persisted and queryable via DQL.
             └──────────────────┬──────────────────┘  One event per trigger per entity.
                                │ correlated by root-cause and impact graph
                                ▼
             ┌─────────────────────────────────────┐
             │  Problem (Denoising)                │  Events sharing the same root-cause
             └──────────────────┬──────────────────┘  and impact graph → one Problem.
                                │ problem event triggers workflow
                                ▼
             ┌─────────────────────────────────────┐
             │  Workflow Notification              │  Filters problems and routes to
             └─────────────────────────────────────┘  email, Slack, ServiceNow, webhook.

When to Use This Skill

Detector setup — “How do I create an anomaly detector?”, “What kind of detector should I use?”, “What is the difference between adaptive and seasonal?”
Alert event history — “Query all alert events for this service”, “Show me which metrics triggered alerts last week”
Problem denoising — “Why did these two alerts merge into one problem?”, “How does Davis group alerts?”
Notification setup — “How do I send a Slack message when a problem opens?”, “Set up a ServiceNow ticket on critical problems”
Best practices — “How do I avoid alert storms?”, “Which sensitivity setting should I use?”
Over-alerting analysis — “Why am I getting too many alerts?”, “How do I reduce alert fatigue?”, “Which detector is firing the most?”, “How do I tune sensitivity or thresholds to avoid noise?”
Notification routing — “How do I route alerts to the right team?”, “Set up scalable problem filters in workflows”, “Send Slack notifications only to the team responsible for the affected service”

Agent Instructions

First step for any alerting setup request — Before recommending a specific detector or model, load references/anomaly-detectors.md and use its category and model decision guide to identify which detector category (DQL-based, Edge, Pipeline, Synthetic, External) and which model (Static, Adaptive, Seasonal) best fits the user’s use-case. Only proceed with configuration guidance once the right detector type has been established.

Consolidate, don’t multiply — When a user asks to alert on multiple entities of the same kind (e.g. “alert on services A, B, and C”), always recommend a single combined detector rather than one detector per entity. Use by: { <dimension> } in the DQL timeseries call to split results per entity, and a single filter: clause to scope to the relevant entities. Pair the combined detector with a single dt.alert_group tag shared across all alert conditions and the corresponding workflow notification filter. This keeps the number of detector configs small, ensures consistent routing, and makes the workflow notification channel reusable for future entities added to the same group.

Example for three services — one detector, one workflow:

timeseries avg(dt.service.request.response_time),
  by: { dt.smartscape.service },
  filter: { in(dt.smartscape.service, {toSmartscapeId("SERVICE-0000000000000001"), toSmartscapeId("SERVICE-0000000000000002"), toSmartscapeId("SERVICE-0000000000000003")}) }

Set dt.alert_group: "checkout-team" in the detector’s event properties, then filter the notification workflow on matchesPhrase(dt.alert_group, "checkout-team"). If a new service must be covered, add it to the single filter: list — no new detector or workflow rule needed.

Intent Mapping

User Request	Action	Reference
“how to alert on …”, “create an alert on …”, “create anomaly detector”, “set up alerting”, “configure alert rule”	Explain detector categories and variants, guide through model selection	anomaly-detectors.md
“what kinds of anomaly detectors”, “edge alert”, “pipeline alert”, “synthetic alert”, “OneAgent alert”	Explain the five alert source categories and their trade-offs	anomaly-detectors.md
“static vs adaptive”, “which detector model”, “seasonal detector”	Compare models, apply decision guide	anomaly-detectors.md
“query alert history”, “which alerts fired”, “Davis events in Grail”	Query `dt.davis.events` in Grail via `fetch dt.davis.events`	davis-events.md
“why did alerts merge”, “problem grouping”, “denoising”	Do NOT explain merging rules here — load `dt-obs-problems` and refer to `problem-merging.md` for the full merge logic	dt-obs-problems/references/problem-merging.md
“send Slack notification”, “email on problem”, “ServiceNow ticket”, “notify on alert”	Explain problem-triggered workflow setup	workflow-notifications.md
“alert storm”, “too many notifications”, “reduce noise”	Filtering strategy, denoising, sensitivity tuning	workflow-notifications.md + anomaly-detectors.md

Analyzing existing problems — If the user wants to query or investigate active/closed problems (root cause, impact, trending), load dt-obs-problems instead. This skill covers configuration and flow, not problem query analytics.

Detector health monitoring — If the user asks whether detectors are running or failing, load dt-platform (ANALYZER_EXECUTION_EVENT, ANOMALY_DETECTOR_STATUS_EVENT). This skill covers setup, not operational health.

Prerequisites

Access to a Dynatrace environment with Settings v2 write permissions for detector configuration
For querying alert history: DQL permissions on dt.davis.events
Load dt-dql-essentials before writing DQL queries

Knowledge Base Structure

#	Reference	Content
1	anomaly-detectors.md	Detector types, model selection, configuration, best practices
2	davis-events.md	Davis event storage in Grail, key fields, DQL query patterns
3	workflow-notifications.md	Problem-triggered workflows, filtering, notification channels

Key Concepts

Alert Source Categories

Five fundamental categories of anomaly detectors, distinguished by where detection runs and how the alert event reaches Dynatrace:

#	Category	Detection runs on	Latency	Alert logic owner
1	DQL-based	Grail (server-side, scheduled)	Minutes	Dynatrace
2	Edge	OneAgent on the monitored host/process	Seconds	Dynatrace (OneAgent)
3	Pipeline	OpenPipeline ingest path (in-stream)	Near-zero	Dynatrace (pipeline rule)
4	Synthetic	Synthetic checker node (worldwide)	Seconds	Dynatrace (synthetic node)
5	External	Customer / external tool	Caller-defined	Customer

See references/anomaly-detectors.md for the full breakdown of each category, including trade-offs and configuration entry points.

Detector Models at a Glance

Model	Threshold	Best for
Static	Fixed value you define	Known hard limits (e.g. error rate > 5%)
Adaptive baseline	Learned from recent history	Metrics with no fixed limit but clear normal behavior
Seasonal baseline	Learned with time-of-day / day-of-week awareness	Traffic, request rate, or any metric with recurring patterns

Davis Events vs. Problems

Concept	Table	Scope
Davis event	`fetch dt.davis.events`	One record per detector trigger per entity
Problem	`fetch dt.davis.problems`	One record per correlated group of events sharing root-cause and impact

A single problem typically contains multiple events. Querying problems gives the operational view; querying events gives the raw alert history.

Problem Denoising

For questions about why alerts merged into a problem or how Davis groups events, load dt-obs-problems — the merge logic and rules are documented in dt-obs-problems/references/problem-merging.md. This skill covers alert configuration and flow only.

Quick Start

Check What Alerts Fired in the Last 24 Hours

fetch dt.davis.events, from: -24h
| filter event.status == "ACTIVE"
| summarize alert_count = count(), by: {event.name, event.category, dt.smartscape_source.id}
| sort alert_count desc
| limit 20

Check Alert Volume by Category

fetch dt.davis.events, from: -24h
| summarize count = count(), by: {event.category, event.status}
| sort count desc

See All Active Problems (→ load dt-obs-problems for full query patterns)

fetch dt.davis.problems, from: -24h
| filter not(dt.davis.is_duplicate) and event.status == "ACTIVE"
| fields event.start, display_id, event.name, event.category
| sort event.start desc
| limit 20

Best Practices

Match the model to the metric’s behavior — Use static for hard SLO boundaries, adaptive for metrics without a natural fixed limit, seasonal for anything that follows business hours or weekly patterns.
Scope detectors narrowly — An entity selector that covers only relevant entities reduces noise and makes problems more actionable.
Tune sensitivity before going to production — Start with LOW sensitivity and move to MEDIUM or HIGH only after observing false-positive rates.
Let Davis denoise before notifying — Trigger workflow notifications on problems, not individual alert events. A problem groups correlated alerts so you notify once per incident, not once per metric.
Filter notifications by severity level — Route event.severity <= 2 problems to on-call channels immediately; route event.severity >= 3 problems to lower- urgency channels. Either set severity in the detector config or assign in a pipeline rule or workflow.
Use dt.alert_group event property for routing — Assign dt.alert_group to route alerts to the right team. Either set a static value in the detector config, use dynamic assignment through DQL query result mapping or assign in a pipeline rule.
Combine same-condition alerts into one detector and one workflow — When alerting on multiple entities with the same metric and threshold, merge them into a single DQL-based detector using by: { <dimension> } and a combined filter: clause. Assign the same dt.alert_group value to every condition in that detector and point the workflow notification channel at that single group. One detector + one workflow per logical alert group scales better than N detectors + N notification rules, and adding a new entity is a one-line filter change rather than a full detector/workflow addition.

dt-obs-problems — Querying, analyzing, and trending detected problems
dt-obs-predictive-analytics — Ad-hoc anomaly and novelty detection using MCP analyzer tools (not persistent alert configs)
dt-platform — Operational health of anomaly detectors (execution events, failure rates)
dt-platform-costs — Query costs generated by anomaly detector DQL
dt-sdlc-quality-gates — Site Reliability Guardian for deployment gate alerting
dt-dql-essentials — DQL syntax for writing detector queries and alert history queries

Skill frontmatter

license: Apache-2.0

dt-alerting

Skill body

dt-alerting

The Alerting Lifecycle

When to Use This Skill

Agent Instructions

Intent Mapping

Prerequisites

Knowledge Base Structure

Key Concepts

Alert Source Categories

Detector Models at a Glance

Davis Events vs. Problems

Problem Denoising

Quick Start

Check What Alerts Fired in the Last 24 Hours

Check Alert Volume by Category

See All Active Problems (→ load dt-obs-problems for full query patterns)

Best Practices

Related Skills

Skill frontmatter