Fundamentals12 min

Latency, Throughput, and Concurrency

The three fundamental pillars of performance. Understand what they are, how they relate, and why you need to measure all three.

When we talk about software performance, three concepts form the basis of any serious analysis: latency, throughput, and concurrency. Understanding what each represents — and how they relate — is essential for diagnosing problems and making intelligent decisions.

Latency: The time of an operation

Latency is the time an operation takes from start to finish. In APIs, we usually measure the response time of a request.

Why latency matters

  • User experience: Every additional millisecond impacts conversion
  • Chained dependencies: In microservices, latencies add up
  • SLAs and SLOs: Service contracts are based on latency

Measuring latency correctly

Looking at the average is not enough. You need to understand the distribution:

  • P50 (median): 50% of requests are faster
  • P95: 95% of requests are faster
  • P99: 99% of requests are faster

The average hides outliers. P99 reveals the experience of users who suffer most.

Throughput: The system's capacity

Throughput is the amount of work the system can process in a period of time. Usually measured in requests per second (RPS) or transactions per second (TPS).

Throughput vs Latency

A common mistake is assuming that low latency means high throughput. In fact:

  • A system can have low latency at low load and collapse when throughput increases
  • Increasing throughput usually increases latency (resource contention)

Factors that limit throughput

  • Database connection pool
  • Available threads
  • Network bandwidth
  • Processing capacity

Concurrency: Simultaneous operations

Concurrency is the number of operations being executed simultaneously. It's different from throughput:

  • Throughput: How many operations we complete per second
  • Concurrency: How many operations are in progress right now

Little's Law

The relationship between these concepts is expressed by Little's Law:

Concurrency = Throughput × Latency

If your throughput is 100 req/s and average latency is 200ms:

Concurrency = 100 × 0.2 = 20 simultaneous requests

Practical implications

  • If latency increases, concurrency increases (for same throughput)
  • If concurrency reaches a limit (connection pool), throughput drops
  • High latency with high concurrency = saturated system

How the three relate

Imagine a restaurant:

  • Latency: Time to prepare each dish
  • Throughput: Dishes served per hour
  • Concurrency: Dishes being prepared simultaneously

If the kitchen has 5 chefs (max concurrency = 5):

  • Each dish takes 10 minutes (latency)
  • Maximum throughput = 30 dishes/hour

If latency increases to 15 minutes (difficult ingredient):

  • Throughput drops to 20 dishes/hour
  • Or we need more chefs (increase concurrency)

Diagnosing problems

High latency, low throughput

Possible causes:

  • Blocking operations (slow queries, synchronous I/O)
  • Insufficient resources (CPU, memory)
  • Lock contention

Latency ok, limited throughput

Possible causes:

  • Small connection pool
  • Thread limit
  • Bottleneck in specific component

High concurrency, general degradation

Possible causes:

  • Saturated system
  • Resource contention
  • Need to scale

Conclusion

Latency, throughput, and concurrency are the three pillars of performance analysis. Understanding how they relate allows you to:

  • Diagnose problems correctly
  • Predict behavior under load
  • Make informed capacity decisions

You can't optimize what you don't measure. And you can't measure correctly without understanding these three concepts.

latencythroughputconcurrencymetrics

Want to understand your platform's limits?

Contact us for a performance assessment.

Contact Us