When a system can't scale anymore, the crucial question is: are we hitting a physical or architectural limit? Physical limits require more resources. Architectural limits require redesign. Confusing them leads to wrong investments.
You can't optimize your way through a physical limit. But you can redesign through an architectural limit.
Physical Limits
What they are
Limits imposed by physics and hardware. Cannot be overcome with software.
Examples
1. Speed of light (network latency)
São Paulo → New York: ~5,500 km
Speed of light in fiber: ~200,000 km/s
Minimum theoretical latency: 27.5 ms (one-way)
Minimum round-trip latency: 55 ms
→ No software makes this faster
2. Disk bandwidth
Typical NVMe SSD: 3,500 MB/s read
If you need to read 10 GB: minimum 2.9 seconds
→ Only path: less data or more disks
3. Storage IOPS
Typical SSD: 100,000 IOPS
If you need 500,000 IOPS: need 5 SSDs
→ No optimization changes this
4. Memory bandwidth
DDR4: ~50 GB/s
If processing 100 GB: ~2 seconds minimum
→ Hardware limit
5. CPU cycles
3 GHz CPU: 3 billion cycles/second
Complex operation: 1000 cycles
Maximum: 3 million operations/second
→ Fundamental processor limit
How to identify
Signs of physical limit:
✓ Resource at 100% utilization
✓ Adding more software doesn't help
✓ Only solved with more/better hardware
✓ Mathematically impossible to be faster
Architectural Limits
What they are
Limits imposed by design decisions. Can be overcome with redesign.
Examples
1. Lock contention
# Limiting architecture
lock = Lock()
def process(item):
with lock: # Only one thread at a time
return do_work(item)
# Capacity: 1 thread, no matter how many CPUs
# Redesign
def process(item):
partition = get_partition(item)
with partition_locks[partition]: # Lock per partition
return do_work(item)
# Capacity: N threads (N partitions)
2. Single point of serialization
Limiting architecture:
All requests → One database → Response
Capacity: limited by single database
Redesign:
Requests → Load Balancer → N databases (sharding)
Capacity: N × individual capacity
3. Inefficient algorithm
# O(n²) - architectural limit
def find_duplicates(items):
duplicates = []
for i, item in enumerate(items):
for other in items[i+1:]:
if item == other:
duplicates.append(item)
return duplicates
# O(n) - redesign
def find_duplicates(items):
seen = set()
duplicates = []
for item in items:
if item in seen:
duplicates.append(item)
seen.add(item)
return duplicates
4. Synchronous architecture
Limiting:
Request → API → DB → Cache → External API → Response
Latency: sum of all steps
Redesign:
Request → API → [DB + Cache + External] parallel → Response
Latency: maximum of steps
5. Inadequate data model
-- Limiting: JOIN on billion-row table
SELECT * FROM orders
JOIN order_items ON orders.id = order_items.order_id
WHERE orders.user_id = 123;
-- Redesign: denormalization
SELECT * FROM orders_denormalized
WHERE user_id = 123;
How to identify
Signs of architectural limit:
✓ Resource NOT at 100%
✓ Adding hardware doesn't solve
✓ There are queues or blockages
✓ Components wait for each other
✓ "It's always been this way" justifies design
Comparison
| Aspect | Physical | Architectural |
|---|---|---|
| Cause | Hardware/Physics | Design decisions |
| Solution | More/better hardware | Redesign |
| Cost | $$ (infra) | Engineering time |
| Limit | Fundamental | Removable |
Diagnosis
Step 1: Measure utilization
CPU: 95% ← Could be physical or architectural
Memory: 40% ← Probably not the limit
Disk: 100% ← Could be physical
Network: 20% ← Probably not the limit
Step 2: Identify bottleneck
# Profiling reveals where time is spent
CPU:
- 80% in function X ← Architectural limit (algorithm)
- 80% in syscalls ← Could be I/O
I/O:
- Waiting for lock ← Architectural
- Waiting for disk ← Could be physical
Step 3: Test hypothesis
# If physical: adding resource should help
# Add more CPU
→ Performance improves? Physical.
→ Doesn't improve? Architectural.
# If architectural: optimization should help
# Change algorithm/design
→ Performance improves significantly? Architectural confirmed.
Optimization Strategies
For physical limits
1. Scale horizontally
- More machines
- More disks
- More replicas
2. Hardware upgrade
- Faster CPU
- NVMe SSD
- More memory
3. Bring data closer
- CDN
- Local cache
- Edge computing
4. Reduce demand
- Compression
- Sampling
- Aggregation
For architectural limits
1. Eliminate contention
- Granular locks
- Lock-free structures
- Partitioning
2. Parallelize
- Independent operations in parallel
- Processing pipeline
- Async I/O
3. Optimize algorithms
- Better O() complexity
- Adequate data structures
- Strategic caching
4. Redesign flows
- Remove dependencies
- Batch processing
- Event-driven
Real Cases
Case 1: "Database is slow"
Observation: Queries take 500ms
Database CPU: 30%
Disk: 20%
Diagnosis: Not a physical limit
Investigation: Query uses full table scan
Solution: Add index
Result: Queries at 5ms (100x improvement)
→ Limit was architectural (missing index)
Case 2: "API doesn't scale"
Observation: 1000 req/s max, 4 CPUs at 100%
Adding more CPUs: Doesn't improve
Diagnosis: Seems physical, but...
Investigation: One goroutine holds global lock
Solution: Remove unnecessary lock
Result: 10,000 req/s with same 4 CPUs
→ Limit was architectural (lock)
Case 3: "Trans-oceanic latency"
Observation: 200ms latency São Paulo → Europe
Network: 10% utilization
Diagnosis: Speed of light
Possible solution: Edge in Europe
Result: 20ms latency
→ Limit was physical (circumvented with architecture)
Conclusion
Before optimizing, diagnose:
- Measure utilization of all resources
- Profile to find bottlenecks
- Test hypotheses before investing
Rules of thumb:
Resource at 100% + adding helps = Physical
Resource at 100% + adding doesn't help = Masked architectural
Resource < 100% + slowness = Architectural
Remember:
- Physical limits are democratic: affect everyone equally
- Architectural limits are tyrannies: created by past decisions
Don't throw hardware at an architecture problem. Don't redesign when you need more disk.