Observability
Mendral + Datadog
Monitors, logs, traces, and dashboards — read by the agent the moment a monitor alerts.
Or book a 15-min demo and we'll wire Datadog into a live investigation.
When a Datadog monitor alerts, Mendral pulls the related logs, traces, and metrics into the investigation automatically. Instead of an on-call engineer pivoting between the alert, the dashboard, the APM service map, and the deploy log, the agent walks the same path — and proposes the fix while the alert is still red.
What the agent can investigate
The Datadog skill adds these surfaces to every Mendral investigation that touches Datadog.
Triage active and recently-triggered monitors with full evaluation context: thresholds, query, recent history, and the services involved.
Pull logs by service, host, trace_id, or arbitrary filter. The agent narrows scope as it forms hypotheses, not by guessing up front.
Fetch trace samples around the alerting window, surface the slowest spans, and identify regressed endpoints by comparing against the previous baseline.
Read time-series for any metric the agent needs to support a hypothesis. Common pattern: confirm a regression starts within minutes of a deploy.
Map services to repos so the agent knows which codebase to investigate. Without this, monitor alerts only get you halfway.
Read incident timeline and connected resources, surface dashboards relevant to the alerting service. Read-only — the agent never edits dashboards or incident state.
How it works
The agent receives a Datadog skill that wraps the v1/v2 API. Webhooks from monitors trigger investigations automatically; CI investigations can also reference Datadog by linking the affected service via the service catalog. The agent samples traces — typical investigations look at 10-50 trace samples, not millions — so there's no new ingest cost.
Connect via a Datadog API key plus an application key. The API key authenticates the request; the application key authorizes the agent's scoped queries. Multi-org auth is supported.
Example investigation
A p99 latency > 500ms monitor on the checkout-api service alerts. Datadog sends a webhook.
- 1
Identifies the service and its repo from the service catalog.
- 2
Pulls the slowest 20 trace samples from the last 10 minutes.
- 3
Sees that 18 of 20 spend their time in repo_orders.list_by_user_id.
- 4
Checks recent deploys — checkout-api shipped 35 minutes before the monitor went red.
- 5
Diffs the relevant code path: a new LEFT JOIN was added without an index.
- 6
Pulls Postgres pg_stat_statements via Datadog database monitoring — confirms a sequential scan over 4M rows.
- 7
Opens a PR adding the index, with trace samples and the slow-query line attached.
Index proposed before the next monitor evaluation cycle. Latency back under threshold once merged.
Frequently asked
Why both an API key and an application key?
Datadog requires both. The API key authenticates the request; the application key authorizes the agent's queries against your data with the permissions of the user who created it.
Does the agent generate dashboards or modify monitors?
No. Reads only. The agent surfaces existing dashboards relevant to an investigation but never modifies monitors, dashboards, or incident state.
How much trace ingest does this add?
None. The agent samples existing traces — typical investigations pull 10-50 samples, not millions. Datadog's existing retention rules apply.
Can we restrict which services the agent can investigate?
Yes. You can scope the application key to specific services or to specific tag combinations.
Does it work without the service catalog?
Partially. The agent can still pull logs and traces, but won't automatically map a monitor to a repo without the service catalog. We recommend setting up service-to-repo mapping during install.
Put Datadog context
in every investigation.
Five-minute install. Datadog connects from the workspace settings. First enriched investigation runs on the next CI failure.
Or book a 15-min demo with the founders.