Latency, Throughput, and Concurrency

When we talk about software performance, three concepts form the basis of any serious analysis: latency, throughput, and concurrency. Understanding what each represents — and how they relate — is essential for diagnosing problems and making intelligent decisions.

Latency: The time of an operation

Latency is the time an operation takes from start to finish. In APIs, we usually measure the response time of a request.

Why latency matters

User experience: Every additional millisecond impacts conversion
Chained dependencies: In microservices, latencies add up
SLAs and SLOs: Service contracts are based on latency

Measuring latency correctly

Looking at the average is not enough. You need to understand the distribution:

P50 (median): 50% of requests are faster
P95: 95% of requests are faster
P99: 99% of requests are faster

The average hides outliers. P99 reveals the experience of users who suffer most.

Throughput: The system's capacity

Throughput is the amount of work the system can process in a period of time. Usually measured in requests per second (RPS) or transactions per second (TPS).

Throughput vs Latency

A common mistake is assuming that low latency means high throughput. In fact:

A system can have low latency at low load and collapse when throughput increases
Increasing throughput usually increases latency (resource contention)

Factors that limit throughput

Database connection pool
Available threads
Network bandwidth
Processing capacity

Concurrency: Simultaneous operations

Concurrency is the number of operations being executed simultaneously. It's different from throughput:

Throughput: How many operations we complete per second
Concurrency: How many operations are in progress right now

Little's Law

The relationship between these concepts is expressed by Little's Law:

Concurrency = Throughput × Latency

If your throughput is 100 req/s and average latency is 200ms:

Concurrency = 100 × 0.2 = 20 simultaneous requests

Practical implications

If latency increases, concurrency increases (for same throughput)
If concurrency reaches a limit (connection pool), throughput drops
High latency with high concurrency = saturated system

How the three relate

Imagine a restaurant:

Latency: Time to prepare each dish
Throughput: Dishes served per hour
Concurrency: Dishes being prepared simultaneously

If the kitchen has 5 chefs (max concurrency = 5):

Each dish takes 10 minutes (latency)
Maximum throughput = 30 dishes/hour

If latency increases to 15 minutes (difficult ingredient):

Throughput drops to 20 dishes/hour
Or we need more chefs (increase concurrency)

Diagnosing problems

High latency, low throughput

Possible causes:

Blocking operations (slow queries, synchronous I/O)
Insufficient resources (CPU, memory)
Lock contention

Latency ok, limited throughput

Possible causes:

Small connection pool
Thread limit
Bottleneck in specific component

High concurrency, general degradation

Possible causes:

Saturated system
Resource contention
Need to scale

Conclusion

Latency, throughput, and concurrency are the three pillars of performance analysis. Understanding how they relate allows you to:

Diagnose problems correctly
Predict behavior under load
Make informed capacity decisions

You can't optimize what you don't measure. And you can't measure correctly without understanding these three concepts.

Latency: The time of an operation

Why latency matters

Measuring latency correctly

Throughput: The system's capacity

Throughput vs Latency

Factors that limit throughput

Concurrency: Simultaneous operations

Little's Law

Practical implications

How the three relate

Diagnosing problems

High latency, low throughput

Latency ok, limited throughput

High concurrency, general degradation

Conclusion

Want to understand your platform's limits?