Misleading Graphs: how visualizations hide the truth

"The graph shows we improved by 300%." Look again. The Y-axis starts at 90, not zero. The real improvement was 3%. Graphs are powerful communication tools, but they can mislead — intentionally or not. This article teaches you to identify and avoid misleading visualizations.

A graph can lie without containing a single false number.

Distortion Techniques

1. Truncating the Y-axis

Misleading Graph:              Honest Graph:

Latency                        Latency
  102 ┤    ╭─╮                   200 ┤
  101 ┤   ╱  ╲                   150 ┤
  100 ┤──╯    ╲                  100 ┤──────────
   99 ┤        ╲                  50 ┤
   98 ┼─────────╯                  0 ┼──────────
      Jan  Feb  Mar                  Jan  Feb  Mar

"Latency dropped 4%!"          "Latency stable"

Why it misleads:
  - Exaggerates small variations
  - Makes trivial change look significant

When it's acceptable:
  - When variation is genuinely important
  - With clear warning that axis is truncated

2. Inconsistent scale

Before:                        After:
p95 (ms)                       p95 (ms)
  1000 ┤                         500 ┤
   800 ┤    ╭───                 400 ┤
   600 ┤   ╱                     300 ┤    ╭───
   400 ┤──╯                      200 ┤   ╱
   200 ┤                         100 ┤──╯
     0 ┼─────                      0 ┼─────

"Before was 800ms!"            "Now it's 300ms!"

Reality: Before 800ms, after 300ms
         But graphs have different scales

3. Selective time period

Last 3 months:                 Last year:
Error %                        Error %
    5 ┤                           5 ┤    ╭─╮
    4 ┤                           4 ┤   ╱  ╲
    3 ┤    ╭──                    3 ┤  ╱    ╲
    2 ┤   ╱                       2 ┤ ╱      ╲
    1 ┤──╯                        1 ┤╯        ╲──
    0 ┼─────                      0 ┼─────────────
      Jan Feb Mar                   Jan     Jul     Jan

"Errors tripled!"              "Back to normal after spike"

4. Aggregation that hides

Daily average:                 Hourly:
Latency (ms)                   Latency (ms)
  200 ┤                         2000 ┤     ╭╮
  150 ┤─────────                1500 ┤     ││
  100 ┤                         1000 ┤    ╱╲╲
   50 ┤                          500 ┤───╯  ╲───
    0 ┼─────────                   0 ┼───────────
      Mon Tue Wed                    0h 6h 12h 18h

"Latency stable at 150ms"      "2s spike at 2pm"

5. Wrong metric choice

Throughput:                    Latency:
(req/s)                        (ms)
 5000 ┤    ╭───                 5000 ┤        ╱
 4000 ┤   ╱                     4000 ┤       ╱
 3000 ┤  ╱                      3000 ┤      ╱
 2000 ┤ ╱                       2000 ┤    ╱╱
 1000 ┤╱                        1000 ┤───╯
    0 ┼─────                       0 ┼─────────
      Load →                        Load →

"System scales well!"          "System saturates at 3000 req/s"

Honest Graphs

Basic rules

1. Y-axis starts at zero:
   - Except when justified AND signaled

2. Consistent scales:
   - Same scale when comparing periods

3. Adequate time context:
   - Period that shows complete pattern
   - No cherry-picking start/end

4. Appropriate aggregation:
   - Don't hide variance
   - Show distribution when relevant

5. Relevant metrics:
   - Show what matters for the question
   - Include correlated metrics

Well-made latency graph

Essential elements:
  ┌────────────────────────────────────────┐
  │ Checkout Latency - Last 24h            │
  │                                        │
  │ ms                                     │
  │ 500 ┤    p99                           │
  │ 400 ┤    ╭╮    ╭╮                      │
  │ 300 ┤   ╱╲╲   ╱ ╲   p95               │
  │ 200 ┤──╱  ╲──╱   ╲──────             │
  │ 100 ┤ p50 ─────────────              │
  │   0 ┼─────────────────────────────    │
  │     0h   6h   12h  18h  24h           │
  │                                        │
  │ ⚠ Deploy at 2pm | Normal peak: 11am-1pm│
  └────────────────────────────────────────┘

Includes:
  - Multiple percentiles
  - Complete period
  - Axis starting at zero
  - Event annotations
  - Context (peak hours)

Dashboard that doesn't mislead

Recommended layout:
  ┌───────────────────────────────────────────┐
  │ OVERVIEW - Order System                   │
  │                                           │
  │ ┌─────────┐ ┌─────────┐ ┌─────────┐      │
  │ │ p95     │ │ Errors  │ │ Throughput│     │
  │ │ 180ms   │ │ 0.3%    │ │ 1.2K/s   │     │
  │ │ ↓12%    │ │ ↓50%    │ │ ↑15%     │     │
  │ └─────────┘ └─────────┘ └─────────┘      │
  │                                           │
  │ Latency by Percentile (7 days)           │
  │ [graph with p50, p95, p99]               │
  │                                           │
  │ Latency Distribution (histogram)          │
  │ [shows distribution shape]               │
  │                                           │
  │ Correlation: Latency vs Throughput       │
  │ [scatter plot with trend line]           │
  └───────────────────────────────────────────┘

Includes:
  - Numbers and trend (Δ vs previous period)
  - Multiple perspectives (time, distribution, correlation)
  - Consistent comparison

Common Mistakes and Fixes

Bar comparison

❌ Wrong:
  Bars with different colors,
  no legend, truncated axis

✅ Correct:
  Same scale, meaningful colors,
  clear legend, axis from zero

Pie chart

❌ Avoid for:
  - Many slices (>5)
  - Precise comparisons
  - Similar values

✅ Use for:
  - Proportions of a whole
  - Few segments
  - When % is more important than absolute value

Trend lines

❌ Wrong:
  Straight line on non-linear data

✅ Correct:
  - Choose appropriate model
  - Show confidence interval
  - Don't extrapolate beyond data

Communicating Honestly

For executives

Do:
  - Simplify without distorting
  - Highlight what matters for decision
  - Include minimum necessary context
  - Indicate uncertainty

Don't:
  - Exaggerate improvements
  - Hide problems
  - Use misleading scales
  - Omit comparison period

For technical teams

Do:
  - Show complete distribution
  - Include correlations
  - Allow drill-down
  - Document methodology

Don't:
  - Average without percentiles
  - Excessive aggregation
  - Hide outliers

Visualization Checklist

## Before publishing a graph

### Axes
- [ ] Y starts at zero (or justified)?
- [ ] Scales consistent between graphs?
- [ ] Labels clear?
- [ ] Units indicated?

### Data
- [ ] Period is representative?
- [ ] Aggregation is appropriate?
- [ ] Outliers handled correctly?
- [ ] Sample is sufficient?

### Context
- [ ] Baseline indicated?
- [ ] Relevant events annotated?
- [ ] Source clear?
- [ ] Date indicated?

### Honesty
- [ ] Is the first impression correct?
- [ ] Would someone without context understand?
- [ ] Am I showing reality?

Real Examples

Case 1: Improvement report

Misleading version:
  "Latency reduced 400%"
  [graph with truncated Y-axis]

Honest version:
  "Latency p95 reduced from 250ms to 180ms (-28%)
   after query optimization. Baseline: last week.
   Measured in staging environment with similar load."

Case 2: Production dashboard

Misleading version:
  [Only average latency]
  "System healthy: 100ms"

Honest version:
  [Percentiles p50/p95/p99]
  "p50=80ms, p95=200ms, p99=1.2s
   ⚠ High p99 indicates issues for 1% of users"

Conclusion

Honest graphs:

Axes from zero - except when clearly justified
Consistent scales - for valid comparisons
Representative period - no cherry-picking
Adequate aggregation - don't hide variance
Context included - baselines, events, methodology

Remember: a misleading graph destroys trust. An honest graph, even showing problems, builds credibility.

The goal isn't for the graph to look good. It's for reality to be understood.

This article is part of the series on the OCTOPUS Performance Engineering methodology.