When we talk about software performance, three concepts form the basis of any serious analysis: latency, throughput, and concurrency. Understanding what each represents — and how they relate — is essential for diagnosing problems and making intelligent decisions.
Latency: The time of an operation
Latency is the time an operation takes from start to finish. In APIs, we usually measure the response time of a request.
Why latency matters
- User experience: Every additional millisecond impacts conversion
- Chained dependencies: In microservices, latencies add up
- SLAs and SLOs: Service contracts are based on latency
Measuring latency correctly
Looking at the average is not enough. You need to understand the distribution:
- P50 (median): 50% of requests are faster
- P95: 95% of requests are faster
- P99: 99% of requests are faster
The average hides outliers. P99 reveals the experience of users who suffer most.
Throughput: The system's capacity
Throughput is the amount of work the system can process in a period of time. Usually measured in requests per second (RPS) or transactions per second (TPS).
Throughput vs Latency
A common mistake is assuming that low latency means high throughput. In fact:
- A system can have low latency at low load and collapse when throughput increases
- Increasing throughput usually increases latency (resource contention)
Factors that limit throughput
- Database connection pool
- Available threads
- Network bandwidth
- Processing capacity
Concurrency: Simultaneous operations
Concurrency is the number of operations being executed simultaneously. It's different from throughput:
- Throughput: How many operations we complete per second
- Concurrency: How many operations are in progress right now
Little's Law
The relationship between these concepts is expressed by Little's Law:
Concurrency = Throughput × Latency
If your throughput is 100 req/s and average latency is 200ms:
Concurrency = 100 × 0.2 = 20 simultaneous requests
Practical implications
- If latency increases, concurrency increases (for same throughput)
- If concurrency reaches a limit (connection pool), throughput drops
- High latency with high concurrency = saturated system
How the three relate
Imagine a restaurant:
- Latency: Time to prepare each dish
- Throughput: Dishes served per hour
- Concurrency: Dishes being prepared simultaneously
If the kitchen has 5 chefs (max concurrency = 5):
- Each dish takes 10 minutes (latency)
- Maximum throughput = 30 dishes/hour
If latency increases to 15 minutes (difficult ingredient):
- Throughput drops to 20 dishes/hour
- Or we need more chefs (increase concurrency)
Diagnosing problems
High latency, low throughput
Possible causes:
- Blocking operations (slow queries, synchronous I/O)
- Insufficient resources (CPU, memory)
- Lock contention
Latency ok, limited throughput
Possible causes:
- Small connection pool
- Thread limit
- Bottleneck in specific component
High concurrency, general degradation
Possible causes:
- Saturated system
- Resource contention
- Need to scale
Conclusion
Latency, throughput, and concurrency are the three pillars of performance analysis. Understanding how they relate allows you to:
- Diagnose problems correctly
- Predict behavior under load
- Make informed capacity decisions
You can't optimize what you don't measure. And you can't measure correctly without understanding these three concepts.