Script Valley
Debugging: A Systematic Approach
Debugging in Production and Distributed SystemsLesson 6.1

How to debug a production incident without taking down the site

production debugging, safe investigation, no-breakpoint debugging, read-only investigation, non-invasive debugging techniques

Production Is Not a Development Environment

In production, you cannot pause execution, inspect heap state, or make experimental changes. Every action has user impact. The discipline of production debugging is using read-only tools -- logs, metrics, traces, and error tracking -- to reconstruct what happened without touching running code.

The Safe Investigation Protocol

When an incident starts: first, look at error rates and logs to confirm scope -- is this affecting all users or a subset? Second, check recent deployments -- did anything ship in the last hour? Third, read error tracking for the specific exception and its context. Fourth, form a hypothesis before taking any action. Do not make changes under pressure without a hypothesis.

# Production investigation -- read-only tools only

# 1. Check error rate spike in last 15 minutes
grep '"level":"error"' app.log | sort | uniq -c

# 2. Find what changed recently
git log --oneline -10

# 3. Get the most common error
cat app.log | jq -r 'select(.level == "error") | .msg' | sort | uniq -c | sort -rn | head -5

# 4. Trace the specific failing request
grep 'correlationId' app.log | jq '{time: .time, msg: .msg}'

When to Escalate

If you cannot identify the cause within 15 minutes and users are actively affected, escalate: rollback the last deployment. Rollback is not failure -- it is the correct engineering response to an undiagnosed production fault. Never spend 45 minutes debugging while users are impacted when a rollback takes two minutes.

Up next

How distributed tracing works and how to use it for debugging

Sign in to track progress

How to debug a production incident without taking down the site โ€” Debugging in Production and Distributed Systems โ€” Debugging: A Systematic Approach โ€” Script Valley โ€” Script Valley