Fundamentals8 min

Response Time: Average vs Percentiles

Why average response time lies about your users' real experience — and how percentiles reveal the truth you need to see.

Imagine you're analyzing the response time of a critical API in your system. The dashboard shows an average of 200ms. Looks great, right? Your SLAs probably require something around 500ms, so you're comfortably within expectations.

But here's the problem: the average is lying to you.

The problem with averages

The arithmetic mean is one of the most intuitive metrics we know. We add up all the values and divide by the count. Simple, familiar, and deeply misleading when applied to response times.

Consider this real scenario: you have 100 requests per second. 95 of them respond in 100ms. But 5 of them take 2 seconds each. What's the average?

(95 × 100ms + 5 × 2000ms) / 100 = 195ms

The average says 195ms. But 5% of your users are waiting 10 times longer than the others. If you have 1 million requests per day, that's 50,000 users with a terrible experience — and the average doesn't tell that story.

The average is a statistical lie that hides the users who suffer most.

What are percentiles

A percentile represents the value below which a given percentage of observations fall. In other words:

  • P50 (median): 50% of requests are faster than this value
  • P90: 90% of requests are faster
  • P95: 95% of requests are faster
  • P99: 99% of requests are faster

Going back to our example: the P95 would be approximately 100ms (because 95% of requests are in that range), but the P99 would be close to 2000ms. This brutal difference between P95 and P99 is exactly what the average hides.

Why P95 and P99 matter

There's a reason why companies like Amazon, Google, and Netflix obsessively monitor P99 and even P99.9: these are the users who matter most to the business.

Think about it: who are the users who most frequently fall into the P99?

  • Users with large carts (more data to process)
  • Users during peak moments (more concurrency)
  • Frequent users (more history, more complexity)
  • Users on slow devices or connections (partial timeouts)

In other words: often your best customers, at the most critical moments, have the worst experience. And the average doesn't show that.

The Long Tail Law

Distributed systems have a peculiar characteristic: the more services a request traverses, the higher the probability of falling into the long tail of latency.

If a request needs to query 5 services in parallel, and each service has a 1% chance of responding slowly, the complete request has approximately 5% chance of being slow. If it's 10 services, 10%. The math works against you.

This means that in microservices architectures, the P99 is even more critical than in monoliths. And it's exactly in these architectures that the average becomes more misleading.

How to interpret percentiles

Here's a practical guide for interpreting your percentiles:

P50 vs P99: The ratio

A useful rule: if your P99 is more than 5x greater than your P50, you have a consistency problem. The system works well most of the time, but has significant failures in specific cases.

Stability under load

In healthy systems, percentiles should remain relatively stable as load increases. If your P99 spikes while P50 remains stable, you have a bottleneck that affects only some requests — probably resource contention or pool limits.

Gradual degradation vs collapse

Observe how percentiles behave near maximum capacity. A gradual increase indicates controlled degradation. An abrupt jump indicates an inflection point — you've found a system limit.

Setting SLAs with percentiles

If you still define SLAs based on averages, you're taking unnecessary risks. A more robust approach:

  • P50 < 200ms: Acceptable typical experience
  • P95 < 500ms: Majority of users satisfied
  • P99 < 1s: Extreme cases still tolerable
  • P99.9 < 2s: No one abandons due to timeout

Note that specific values depend on your context. What matters is having goals for multiple percentiles, not just for the average or a single point.

In practice: What to monitor

For each critical endpoint or service, you should be collecting and visualizing:

  1. P50, P95, P99 in real time (last 5-15 minutes)
  2. Historical trend of these percentiles (last 7-30 days)
  3. Comparison between low and high load periods
  4. Alerts based on percentile deviations, not the average

If your monitoring system only offers averages, you're flying blind. Investing in observability that supports percentiles is one of the best returns on investment in infrastructure.

Conclusion

The next time someone says "our average response time is X," ask the question that really matters: "What about the P99?"

The difference between a system that "works well on average" and a system that "works well for all users" lies in the percentiles. And that difference is often the difference between a product users tolerate and a product users love.

You don't want to know how your system performs on average. You want to know how it performs when it matters most.

percentilelatencyP99SLAmonitoring

Want to understand your platform's limits?

Contact us for a performance assessment.

Contact Us