Data-Driven Decisions: transforming metrics into action

"The data shows we need to act." But do what, exactly? Performance data is abundant, but transforming it into decisions is a rare skill. This article teaches how to use data to guide choices, not just generate reports.

Data informs decisions. It doesn't make them.

The Gap Between Data and Decision

Data that doesn't become action

Common scenario:
  Dashboard: "p99 latency = 2s"
  Meeting: "Interesting"
  Action: None

Why:
  - Data not connected to impact
  - No threshold defined
  - Responsibility not clear
  - Next step not obvious

Data that becomes action

Effective scenario:
  Dashboard: "p99 = 2s (SLO: 1s) - Affects 5% of checkouts"
  Meeting: "We're 2x above SLO"
  Action: "DB optimization sprint"

Difference:
  - Data connected to SLO
  - Impact quantified
  - Owner identified
  - Clear action

Decision Framework

1. Connect data to objective

Raw data:
  "CPU at 85%"

With context:
  "CPU at 85% (target: <70% for headroom)
   Risk: Next spike may cause degradation"

With decision:
  "Options:
   A) Scale now ($X/month)
   B) Optimize code (Y sprints)
   C) Accept risk (probability Z%)"

2. Define action thresholds

For each critical metric, define:

Checkout p95 Latency:
  Green: < 500ms → No action
  Yellow: 500-800ms → Investigate in 48h
  Red: > 800ms → Immediate action
  Critical: > 2s → Incident

Error Rate:
  Green: < 0.5% → No action
  Yellow: 0.5-1% → Investigate
  Red: > 1% → Immediate action
  Critical: > 5% → Incident

3. Map standard decisions

If [condition], then [action], owner [who]

Examples:
  If p95 > SLO for 2 days:
    → Create investigation ticket
    → Owner: Tech Lead
    → Deadline: 5 business days

  If CPU > 80% for 1 hour:
    → Alert to on-call
    → Evaluate auto-scaling
    → Owner: SRE

  If error rate > 1% for 15 min:
    → Automatic incident
    → Owner: On-call
    → Communication: Slack #incidents

Types of Decisions

Operational decisions (minutes)

Trigger: Threshold alert
Data: Real-time metrics
Decider: On-call / automation

Examples:
  - Auto-scale
  - Activate circuit breaker
  - Redirect traffic
  - Rollback deploy

Framework:
  If X, then Y automatically
  If not resolved in Z minutes, escalate

Tactical decisions (days/weeks)

Trigger: Degradation trend / SLO gap
Data: Aggregated metrics, root cause analysis
Decider: Tech Lead / Engineering Manager

Examples:
  - Prioritize optimization in sprint
  - Add cache for endpoint
  - Refactor problematic query
  - Increase infra capacity

Framework:
  Cost-benefit analysis
  Backlog prioritization
  Implementation timeline

Strategic decisions (months)

Trigger: Capacity planning / Roadmap
Data: Long-term trends, projections
Decider: VP Eng / CTO

Examples:
  - Migrate to different architecture
  - Invest in observability platform
  - Hire SRE team
  - Change cloud provider

Framework:
  Business case with ROI
  Alternatives analysis
  Implementation roadmap

Translating Data for Stakeholders

For engineering

Detailed technical data:
  - Latency percentiles by endpoint
  - Time breakdown by component
  - Resource utilization
  - Identified correlations

Clear decision:
  "Query X accounts for 40% of latency.
   Adding index fixes in 2 hours.
   Prioritize?"

For product

Experience data:
  - Load time by feature
  - Error rate by flow
  - Impact on conversion funnel

Clear decision:
  "Slow checkout causing 5% abandonment.
   Investing 1 sprint improves conversion by ~1%.
   Value: $X/month. Prioritize?"

For executives

Impact data:
  - Revenue at risk
  - Cost of inaction
  - Investment ROI

Clear decision:
  "Current capacity won't support Black Friday.
   Option A: $50K to guarantee.
   Option B: Risk $200K in lost sales.
   Decision needed by [date]."

Documenting Decisions

Decision template

# Decision: [Title]

## Context
- Date: [when]
- Trigger: [what motivated]
- Data: [relevant metrics]

## Problem
[Clear description of the problem]

## Options Considered

### Option A: [Name]
- Description: [what to do]
- Cost: [time, money, effort]
- Benefit: [expected result]
- Risk: [what can go wrong]

### Option B: [Name]
[same structure]

## Decision
- Chosen: [which option]
- Reason: [why]
- Owner: [who executes]
- Deadline: [when]

## Success Metrics
- [How we'll know if it worked]

## Review
- Date: [when to reevaluate]

Decision record (ADR)

Maintain history of:
  - Decisions made
  - Context at the time
  - Result obtained
  - Lessons learned

Benefits:
  - Avoid repeating mistakes
  - Document reasoning
  - Facilitate onboarding
  - Create knowledge base

Decision-Making Pitfalls

1. Analysis paralysis

❌ "We need more data before deciding"
   (While problem persists)

✅ "Current data supports decision X with 80% confidence.
    Risk of waiting: Y. Deciding now."

2. Decision without data

❌ "Let's add cache because it always helps"

✅ "Data shows cache hit rate of 30%.
    Improving to 80% would reduce latency by 40%.
    Cost: 2 days. Deciding to implement."

3. Confirmation bias

❌ Seeking data that supports already-made decision

✅ Analyze data objectively, including
   that which contradicts the hypothesis

4. Ignoring uncertainty

❌ "Data shows A is better"
   (Without confidence interval)

✅ "Data suggests A is better (95% CI: 5-15% improvement).
    Probability B is better: 20%."

Automating Decisions

When to automate

Automate if:
  - Decision is frequent
  - Criteria are clear
  - Risk of error is low
  - Speed is important

Examples:
  - Autoscaling based on CPU
  - Circuit breaker based on error rate
  - Automatic rollback by metric

When to keep human

Keep human decision if:
  - Context is complex
  - Trade-offs are unclear
  - Impact is high and irreversible
  - Non-technical factors matter

Examples:
  - Architecture change
  - Infrastructure investment
  - Roadmap prioritization

Conclusion

Data-driven decisions require:

Connect data to objectives - not isolated numbers
Define thresholds - when to act
Map actions - what to do when
Translate for audience - appropriate language
Document decisions - create knowledge

Data is the beginning, not the end. The value is in the action it informs.

Data without decision is cost. Data with decision is investment.

This article is part of the series on the OCTOPUS Performance Engineering methodology.