Agent Skill · dynatrace

dt-obs-logs

Log querying, filtering, pattern analysis, and error rate calculation. Use when searching application or infrastructure logs, analyzing error patterns, or correlating log data. Trigger: "show error logs", "search logs for keyword", "log error rate", "recent errors", "logs from last hour", "find log entries", "top error messages", "log patterns", "parse JSON logs", "logs by process group", "log trends over time", "log entry counts per minute". Do NOT use for explaining existing queries, product documentation questions, distributed tracing or span analysis (use dt-obs-tracing).

Provider: dynatrace Path in repo: skills/dt-obs-logs/SKILL.md

Skill body

Log Analysis Skill

Query, filter, and analyze Dynatrace log data using DQL for troubleshooting and monitoring.

What This Skill Covers

Cross-source join required: If the query must combine logs with host attributes (OS type, hostname, IP address, cloud provider) → also read dt-dql-essentials/references/smartscape-topology-navigation.md before writing the query.


Use Cases

Use this skill when users want to:

Key Concepts

Log Data Model

Query Patterns

Common Operations

Core Workflows

1. Log Searching

Find specific log entries by time, severity, and content.

Typical steps:

  1. Define time range
  2. Filter by severity (optional)
  3. Search content for keywords
  4. Select relevant fields
  5. Sort and limit results

Example:

fetch logs, from:now() - 1h
| filter status == "ERROR"
| fields timestamp, content, process_group = dt.process_group.detected_name
| sort timestamp desc
| limit 100

2. Log Filtering

Narrow down logs using multiple criteria (severity, entity, content).

Typical steps:

  1. Fetch logs with time range
  2. Apply severity filters
  3. Filter by entity (process_group)
  4. Apply content filters
  5. Format and sort output

Example:

fetch logs, from:now() - 2h
| filter in(status, {"ERROR", "FATAL", "WARN"})
| summarize count(), by: {dt.process_group.id, dt.process_group.detected_name}
| fieldsAdd process_group = dt.process_group.detected_name
| sort `count()` desc

3. Pattern Analysis

Identify patterns, trends, and anomalies in log data.

Typical steps:

  1. Fetch logs with time range
  2. Add pattern detection fields
  3. Aggregate by entity or time
  4. Calculate statistics and ratios
  5. Sort by frequency or rate

Example:

fetch logs, from:now() - 2h
| filter status == "ERROR"
| fieldsAdd
    has_exception = if(matchesPhrase(content, "exception"), true, else: false),
    has_timeout = if(matchesPhrase(content, "timeout"), true, else: false)
| summarize
    count(),
    exception_count = countIf(has_exception == true),
    timeout_count = countIf(has_timeout == true),
    by: {process_group = dt.process_group.detected_name}

Key Functions

Filtering

Entity Operations

Aggregation

Field Operations

Common Patterns

Simple substring search:

fetch logs, from:now() - 1h
| filter contains(content, "database")
| fields timestamp, content, status

Full-text phrase search:

fetch logs, from:now() - 1h
| filter matchesPhrase(content, "connection timeout")
| fields timestamp, content, process_group = dt.process_group.detected_name

Error Rate Calculation

Calculate error rates over time:

fetch logs, from:now() - 2h
| summarize
    total_logs = count(),
    error_logs = countIf(status == "ERROR"),
    by: {time_bucket = bin(timestamp, 5m)}
| fieldsAdd error_rate = (error_logs * 100.0) / total_logs
| sort time_bucket asc

Top Error Messages

Find most common errors:

fetch logs, from:now() - 24h
| filter status == "ERROR"
| summarize error_count = count(), by: {content}
| sort error_count desc
| limit 20

Process Group-Specific Logs

Filter logs by process group:

fetch logs, from:now() - 1h
| fieldsAdd process_group = dt.process_group.detected_name
| filter process_group == "payment-service"
| filter status == "ERROR"
| fields timestamp, content, status
| sort timestamp desc

Structured / JSON Log Parsing

Many applications emit JSON-formatted log lines. Use parse to extract fields instead of dumping raw content:

fetch logs, from:now() - 1h
| filter status == "ERROR"
| parse content, "JSON:log"
| fieldsAdd level = log[level], message = log[msg], error = log[error]
| fields timestamp, level, message, error
| sort timestamp desc
| limit 50

Aggregate by a parsed field:

fetch logs, from:now() - 4h
| filter status == "ERROR"
| parse content, "JSON:log"
| fieldsAdd message = log[msg]
| summarize error_count = count(), by: {message}
| sort error_count desc
| limit 20

Notes:

Best Practices

  1. Always specify time ranges - Use from:now() - <duration> to limit data
  2. Apply filters early - Filter by severity and entity before aggregation
  3. Use appropriate search methods - contains() for simple, matchesPhrase() for exact
  4. Limit results - Add | limit 100 to prevent overwhelming output
  5. Sort meaningfully - Sort by timestamp for recent logs, by count for top errors
  6. Name entities - Use dt.process_group.detected_name or getNodeName() for human-readable output
  7. Use time buckets for trends - bin(timestamp, 5m) for time-series analysis

Integration Points

Limitations & Notes

Troubleshooting

Problem Cause Solution
No logs returned Missing time range or too narrow Widen from: window; verify log ingestion is active
getNodeName() returns null OneAgent not monitoring the entity or entity not yet resolved Verify OneAgent is deployed and entity is discovered; use dt.process_group.detected_name as a reliable alternative
matchesPhrase() slow on large data Full-text search without pre-filtering Add filter status == "ERROR" before matchesPhrase()
Wrong field name log.level Common mistake Use loglevel (no dot) for severity; see dt-dql-essentials
Empty content field Log line was empty or not ingested Check log source configuration in OneAgent

Skill frontmatter

license: Apache-2.0