Agent Skill · dynatrace

dt-obs-hosts

Host and process metrics including CPU, memory, disk, network, containers, and process-level telemetry. Use when analyzing infrastructure health, resource utilization, process consumption, or host discovery. Also use when building timeseries queries for host metrics that feed into analytical workflows like anomaly detection, forecasting, or seasonality analysis. Trigger: "show hosts", "CPU usage", "memory utilization", "disk space", "high CPU", "host with most free disk", "top hosts by CPU", "top processes by memory", "Linux hosts in AWS", "what databases are running", "infrastructure costs by cost center", "hosts running EOL Java", "container monitoring", "listening ports", "process resource consumption", "CPU forecast", "memory anomaly", "host seasonality". Do NOT use for explaining existing queries, product documentation questions, Kubernetes pod/workload queries (use dt-obs-kubernetes), AWS cloud resource inventory (use dt-obs-aws), or service-level metrics (use dt-obs-services).

Provider: dynatrace Path in repo: skills/dt-obs-hosts/SKILL.md

Skill body

Infrastructure Hosts Skill

Monitor and manage host and process infrastructure including CPU, memory, disk, network, and technology inventory.

When to Use This Skill

Use this skill when the user needs to:


Cross-source join required: If the query must combine host data with logs or other telemetry sources (e.g. “show logs from Linux hosts with their IP addresses”) → also read dt-dql-essentials/references/smartscape-topology-navigation.md before writing the query. —

Core Concepts

Entities

Metrics Categories

  1. Host Metrics - dt.host.cpu.*, dt.host.memory.*, dt.host.disk.*, dt.host.net.*
  2. Process Metrics - dt.process.cpu.*, dt.process.memory.*, dt.process.io.*, dt.process.network.*
  3. Inventory - OS type, cloud provider, technology stack, versions
  4. Cost - dt.cost.costcenter, dt.cost.product
  5. Quality - Metadata completeness, version compliance

Alert Thresholds


Key Workflows

1. Host Discovery and Classification

Discover hosts, classify by OS/cloud, inventory resources.

smartscapeNodes "HOST"
| fieldsAdd os.type, cloud.provider, host.logical.cpu.cores, host.physical.memory
| summarize host_count = count(), by: {os.type, cloud.provider}
| sort host_count desc

OS Types: LINUX, WINDOWS, AIX, SOLARIS, ZOS

→ For cloud-specific attributes, see references/inventory-discovery.md

2. Resource Utilization Monitoring

Monitor CPU, memory, disk, network across hosts.

timeseries {
  cpu = avg(dt.host.cpu.usage),
  memory = avg(dt.host.memory.usage),
  disk = avg(dt.host.disk.used.percent)
}, by: {dt.smartscape.host}
| fieldsAdd host_name = getNodeName(dt.smartscape.host)
| filter arrayAvg(cpu) > 80 or arrayAvg(memory) > 80
| sort arrayAvg(cpu) desc

High utilization threshold: 80% warning, 90% critical

Key CPU Metrics:

→ For detailed CPU analysis, see references/host-metrics.md
→ For memory breakdown, see references/host-metrics.md

Disk Free Space — Find Hosts with Most/Least Free Disk

timeseries disk_used_pct = avg(dt.host.disk.used.percent), by: {dt.smartscape.host}
| fieldsAdd host_name = getNodeName(dt.smartscape.host)
| fieldsAdd avg_disk_used = arrayAvg(disk_used_pct),
    free_pct = 100 - arrayAvg(disk_used_pct)
| sort free_pct desc
| limit 10

3. Process Resource Analysis

Identify top resource consumers at process level.

timeseries {
  cpu = avg(dt.process.cpu.usage),
  memory = avg(dt.process.memory.usage)
}, by: {dt.smartscape.process}
| fieldsAdd process_name = getNodeName(dt.smartscape.process)
| filter arrayAvg(cpu) > 50
| sort arrayAvg(cpu) desc
| limit 20

→ For process I/O analysis, see references/process-monitoring.md
→ For process network metrics, see references/process-monitoring.md

4. Technology Stack Inventory

Discover and track software technologies and versions.

smartscapeNodes "PROCESS"
| fieldsAdd process.software_technologies
| expand tech = process.software_technologies
| fieldsAdd tech_type = tech[type], tech_version = tech[version]
| summarize process_count = count(), by: {tech_type, tech_version}
| sort process_count desc

Common Technologies: Java, Node.js, Python, .NET, databases, web servers, messaging systems

→ For version compliance checks, see references/inventory-discovery.md

5. Service Discovery via Ports

Map listening ports to services for security and inventory.

smartscapeNodes "PROCESS"
| fieldsAdd process.listen_ports, dt.process_group.detected_name
| filter isNotNull(process.listen_ports) and arraySize(process.listen_ports) > 0
| expand listen_port = process.listen_ports
| summarize process_count = count(), by: {listen_port, dt.process_group.detected_name}
| sort toLong(listen_port) asc
| limit 50

Well-known ports: 80 (HTTP), 443 (HTTPS), 22 (SSH), 3306 (MySQL), 5432 (PostgreSQL)

→ For comprehensive port mapping, see references/inventory-discovery.md

6. Container and Kubernetes Monitoring

Track container distribution and K8s workload types.

smartscapeNodes "CONTAINER"
| fieldsAdd k8s.cluster.name, k8s.namespace.name, k8s.workload.kind
| summarize container_count = count(), by: {k8s.cluster.name, k8s.workload.kind}
| sort k8s.cluster.name, container_count desc

Workload Types: deployment, daemonset, statefulset, job, cronjob

Note: Container image names/versions NOT available in smartscape.

→ For K8s version tracking, see references/container-monitoring.md
→ For container lifecycle, see references/container-monitoring.md

7. Cost Attribution and Chargeback

Calculate infrastructure costs by cost center.

smartscapeNodes "HOST"
| fieldsAdd dt.cost.costcenter, host.logical.cpu.cores, host.physical.memory
| filter isNotNull(dt.cost.costcenter)
| fieldsAdd memory_gb = toDouble(host.physical.memory) / 1024 / 1024 / 1024
| summarize 
    host_count = count(),
    total_cores = sum(toLong(host.logical.cpu.cores)),
    total_memory_gb = sum(memory_gb),
    by: {dt.cost.costcenter}
| sort total_cores desc

→ For product-level cost tracking, see references/inventory-discovery.md

8. Infrastructure Health Correlation

Correlate host and process metrics for cross-layer analysis.

timeseries {
  host_cpu = avg(dt.host.cpu.usage),
  host_memory = avg(dt.host.memory.usage),
  process_cpu = avg(dt.process.cpu.usage)
}, by: {dt.smartscape.host, dt.smartscape.process}
| fieldsAdd
    host_name = getNodeName(dt.smartscape.host),
    process_name = getNodeName(dt.smartscape.process)
| filter arrayAvg(host_cpu) > 70
| sort arrayAvg(host_cpu) desc

Health scoring: Critical if any resource >90%, warning if >80%

→ For multi-resource saturation detection, see references/host-metrics.md


Response Construction

When the user asks for data retrieval or a DQL query (e.g., “show me top hosts by CPU”), include the DQL query in the response alongside the results. Users want to see and reuse the query — it is the deliverable, not just a means to get results.

When the user asks for analysis (anomaly detection, forecasting, seasonality), the analysis results are the deliverable. Focus on presenting findings clearly:


Analytical Workflows

Host metric queries often serve as inputs to analytical tools (anomaly detection, forecasting, seasonality analysis). This skill helps construct the right DQL query; the actual analysis is performed by dedicated tools.

Anomaly Detection and Pattern Analysis

When users ask about “unusual behavior”, “anomalies”, “spikes”, or “sudden changes” in host metrics, the workflow is:

  1. Construct the timeseries query using this skill’s patterns
  2. Pass it to the appropriate analysis tool (anomaly detector, novelty detection)

Choosing between detectors:

Response format for anomaly results: Include both the host name (resolved via getNodeName(dt.smartscape.host) or get-entity-name) and the host entity ID alongside timestamps and values. Entity IDs alone are opaque to users; names alone prevent follow-up queries.

Novelty type selection rule: When using novelty detection, set analysisNoveltyType to only [SPIKE, CHANGE_IN_VALUES, TREND_IN_VALUES] by default. EXCLUDE GAP_WITH_MISSING_VALUES and CHANGE_IN_MISSING_VALUES unless the user explicitly asks about data gaps or monitoring coverage. Data gaps are infrastructure issues, not metric behavior anomalies — reporting them when the user asks about CPU or memory patterns is incorrect.

Queries for analysis tools should use simple timeseries format with a single aggregated metric and appropriate time range:

timeseries avg(dt.host.cpu.idle), by: {dt.smartscape.host}
timeseries avg(dt.host.memory.usage), by: {dt.smartscape.host}

Avoid adding filters or field transformations that reduce the data — the analysis tools work best with complete timeseries data.

Forecasting

When users ask to “predict”, “forecast”, or “estimate future” host metrics:

  1. Construct the timeseries query with sufficient historical data (e.g., 7d for short-term, 30d for longer predictions)
  2. Pass to the forecasting tool with the desired forecast horizon

The forecast horizon (how far ahead to predict) and the historical window (how much past data the model trains on) are independent. A request like “forecast the next 2 hours” sets the horizon to 2h — it says nothing about the lookback. Always use at least 7 days of historical data regardless of how short the forecast horizon is. Too few training data points cause the forecast model to fail and fall back to raw historical values.

timeseries avg(dt.host.cpu.usage), by: {dt.smartscape.host}

Seasonality Detection

When users ask about “seasonality”, “weekly patterns”, or “recurring behavior”:

  1. Use a longer time range (at least 14d for weekly, 30d+ for monthly)
  2. Pass to the seasonal baseline anomaly detector

Response format for seasonal analysis: When presenting results, include:

Scope Boundary — Service-Level vs Host-Level Metrics

This skill covers host and process infrastructure metrics only. If the user asks about service-level metrics (request rate, response time, error rate, service calls per minute, throughput), use dt-obs-services instead — even when the question involves forecasting or anomaly detection of those metrics.

Redirect these to dt-obs-services: “service calls per minute”, “request rate”, “response time by service”, “error rate by endpoint”, “service throughput forecast”.


Common Query Patterns

Pattern 1: Smartscape Discovery

Use smartscapeNodes to discover and classify entities.

smartscapeNodes "HOST"
| fieldsAdd <attributes>
| filter <conditions>
| summarize <aggregations>

Pattern 2: Timeseries Performance

Use timeseries to analyze metrics over time.

timeseries metric = avg(dt.host.<metric>), by: {dt.smartscape.host}
| fieldsAdd <calculations>
| filter <thresholds>

Pattern 3: Cross-Layer Correlation

Correlate host and process metrics.

timeseries {
  host_cpu = avg(dt.host.cpu.usage),
  process_cpu = avg(dt.process.cpu.usage)
}, by: {dt.smartscape.host, dt.smartscape.process}

Pattern 4: Entity Enrichment with Lookup

Enrich data with entity attributes. After lookup, reference fields with lookup. prefix.

timeseries cpu = avg(dt.host.cpu.usage), by: {dt.smartscape.host}
| lookup [
    smartscapeNodes HOST
    | fields id, cpuCores, memoryTotal
  ], sourceField:dt.smartscape.host, lookupField:id
| fieldsAdd cores = lookup.cpuCores, mem_gb = lookup.memoryTotal / 1024 / 1024 / 1024

Tags and Metadata

Important Notes

Available Tags

→ For complete tag reference, see references/inventory-discovery.md


Cloud-Specific Attributes

AWS

Azure

Kubernetes

→ For multi-cloud analysis, see references/inventory-discovery.md


Best Practices

  1. Use percentiles (p95, p99) for latency; max() for limits; avg() for trends
  2. Set multi-level thresholds (warning 80%, critical 90%)
  3. Filter early in the pipeline; limit results with | limit N
  4. Aggregate before enrichment (lookup)
  5. Use getNodeName(dt.smartscape.host) for human-readable host names; getNodeName(dt.smartscape.process) for processes
  6. Convert bytes to GB: / 1024 / 1024 / 1024; round with round(value, decimals: 1)
Time windows: Real-time: 5-15 min Trends: 1-7 days Capacity planning: 30-90 days

Limitations


Troubleshooting

Problem Cause Solution
No hosts returned from smartscapeNodes "HOST" Missing time range or OneAgent not deployed Verify OneAgent is installed; add a time range to the query
tags field always empty Generic tags not populated in smartscape Use specific tag namespaces: tags:azure[*], tags:environment, dt.cost.costcenter
Memory values in bytes are unreadable Raw metric unit is bytes Divide by 1024 / 1024 / 1024 and use round(value, decimals: 1)
dt.host.cpu.iowait returns no data Metric is Linux-only Check os.type; iowait is unavailable on Windows, AIX, Solaris
Container image names missing Not available in smartscape Use k8s.object parsing for image details; see dt-obs-kubernetes skill
process.software_technologies is empty Process not monitored by deep injection Verify OneAgent deep monitoring is enabled for the process group

When to Load References

This skill uses progressive disclosure. Start here for 80% of use cases. Load reference files for detailed specifications when needed.

Load host-metrics.md when:

Load process-monitoring.md when:

Load container-monitoring.md when:

Load inventory-discovery.md when:


References


Skill frontmatter

license: Apache-2.0