Physical vs Architectural Limits: understanding performance barriers

When a system can't scale anymore, the crucial question is: are we hitting a physical or architectural limit? Physical limits require more resources. Architectural limits require redesign. Confusing them leads to wrong investments.

You can't optimize your way through a physical limit. But you can redesign through an architectural limit.

Physical Limits

What they are

Limits imposed by physics and hardware. Cannot be overcome with software.

Examples

1. Speed of light (network latency)

São Paulo → New York: ~5,500 km
Speed of light in fiber: ~200,000 km/s
Minimum theoretical latency: 27.5 ms (one-way)
Minimum round-trip latency: 55 ms

→ No software makes this faster

2. Disk bandwidth

Typical NVMe SSD: 3,500 MB/s read
If you need to read 10 GB: minimum 2.9 seconds

→ Only path: less data or more disks

3. Storage IOPS

Typical SSD: 100,000 IOPS
If you need 500,000 IOPS: need 5 SSDs

→ No optimization changes this

4. Memory bandwidth

DDR4: ~50 GB/s
If processing 100 GB: ~2 seconds minimum

→ Hardware limit

5. CPU cycles

3 GHz CPU: 3 billion cycles/second
Complex operation: 1000 cycles
Maximum: 3 million operations/second

→ Fundamental processor limit

How to identify

Signs of physical limit:
✓ Resource at 100% utilization
✓ Adding more software doesn't help
✓ Only solved with more/better hardware
✓ Mathematically impossible to be faster

Architectural Limits

What they are

Limits imposed by design decisions. Can be overcome with redesign.

Examples

1. Lock contention

# Limiting architecture
lock = Lock()
def process(item):
    with lock:  # Only one thread at a time
        return do_work(item)

# Capacity: 1 thread, no matter how many CPUs

# Redesign
def process(item):
    partition = get_partition(item)
    with partition_locks[partition]:  # Lock per partition
        return do_work(item)

# Capacity: N threads (N partitions)

2. Single point of serialization

Limiting architecture:
  All requests → One database → Response
  Capacity: limited by single database

Redesign:
  Requests → Load Balancer → N databases (sharding)
  Capacity: N × individual capacity

3. Inefficient algorithm

# O(n²) - architectural limit
def find_duplicates(items):
    duplicates = []
    for i, item in enumerate(items):
        for other in items[i+1:]:
            if item == other:
                duplicates.append(item)
    return duplicates

# O(n) - redesign
def find_duplicates(items):
    seen = set()
    duplicates = []
    for item in items:
        if item in seen:
            duplicates.append(item)
        seen.add(item)
    return duplicates

4. Synchronous architecture

Limiting:
  Request → API → DB → Cache → External API → Response
  Latency: sum of all steps

Redesign:
  Request → API → [DB + Cache + External] parallel → Response
  Latency: maximum of steps

5. Inadequate data model

-- Limiting: JOIN on billion-row table
SELECT * FROM orders
JOIN order_items ON orders.id = order_items.order_id
WHERE orders.user_id = 123;

-- Redesign: denormalization
SELECT * FROM orders_denormalized
WHERE user_id = 123;

How to identify

Signs of architectural limit:
✓ Resource NOT at 100%
✓ Adding hardware doesn't solve
✓ There are queues or blockages
✓ Components wait for each other
✓ "It's always been this way" justifies design

Comparison

Aspect	Physical	Architectural
Cause	Hardware/Physics	Design decisions
Solution	More/better hardware	Redesign
Cost	$$ (infra)	Engineering time
Limit	Fundamental	Removable

Diagnosis

Step 1: Measure utilization

CPU: 95% ← Could be physical or architectural
Memory: 40% ← Probably not the limit
Disk: 100% ← Could be physical
Network: 20% ← Probably not the limit

Step 2: Identify bottleneck

# Profiling reveals where time is spent
CPU:
  - 80% in function X ← Architectural limit (algorithm)
  - 80% in syscalls ← Could be I/O

I/O:
  - Waiting for lock ← Architectural
  - Waiting for disk ← Could be physical

Step 3: Test hypothesis

# If physical: adding resource should help
# Add more CPU
→ Performance improves? Physical.
→ Doesn't improve? Architectural.

# If architectural: optimization should help
# Change algorithm/design
→ Performance improves significantly? Architectural confirmed.

Optimization Strategies

For physical limits

1. Scale horizontally
   - More machines
   - More disks
   - More replicas

2. Hardware upgrade
   - Faster CPU
   - NVMe SSD
   - More memory

3. Bring data closer
   - CDN
   - Local cache
   - Edge computing

4. Reduce demand
   - Compression
   - Sampling
   - Aggregation

For architectural limits

1. Eliminate contention
   - Granular locks
   - Lock-free structures
   - Partitioning

2. Parallelize
   - Independent operations in parallel
   - Processing pipeline
   - Async I/O

3. Optimize algorithms
   - Better O() complexity
   - Adequate data structures
   - Strategic caching

4. Redesign flows
   - Remove dependencies
   - Batch processing
   - Event-driven

Real Cases

Case 1: "Database is slow"

Observation: Queries take 500ms
Database CPU: 30%
Disk: 20%

Diagnosis: Not a physical limit

Investigation: Query uses full table scan
Solution: Add index

Result: Queries at 5ms (100x improvement)
→ Limit was architectural (missing index)

Case 2: "API doesn't scale"

Observation: 1000 req/s max, 4 CPUs at 100%
Adding more CPUs: Doesn't improve

Diagnosis: Seems physical, but...

Investigation: One goroutine holds global lock
Solution: Remove unnecessary lock

Result: 10,000 req/s with same 4 CPUs
→ Limit was architectural (lock)

Case 3: "Trans-oceanic latency"

Observation: 200ms latency São Paulo → Europe
Network: 10% utilization

Diagnosis: Speed of light

Possible solution: Edge in Europe

Result: 20ms latency
→ Limit was physical (circumvented with architecture)

Conclusion

Before optimizing, diagnose:

Measure utilization of all resources
Profile to find bottlenecks
Test hypotheses before investing

Rules of thumb:

Resource at 100% + adding helps = Physical
Resource at 100% + adding doesn't help = Masked architectural
Resource < 100% + slowness = Architectural

Remember:

Physical limits are democratic: affect everyone equally
Architectural limits are tyrannies: created by past decisions

Don't throw hardware at an architecture problem. Don't redesign when you need more disk.

Physical Limits

What they are

Examples

How to identify

Architectural Limits

What they are

Examples

How to identify

Comparison

Diagnosis

Step 1: Measure utilization

Step 2: Identify bottleneck

Step 3: Test hypothesis

Optimization Strategies

For physical limits

For architectural limits

Real Cases

Case 1: "Database is slow"

Case 2: "API doesn't scale"

Case 3: "Trans-oceanic latency"

Conclusion

Want to understand your platform's limits?