"The code is optimized, but the system is still slow." Or: "We refactored everything, but the performance is the same." These symptoms indicate confusion between code problems and architecture problems. This article teaches you to distinguish between the two and attack the right problem.
Optimizing code in a broken architecture is polishing the Titanic while it sinks.
The Fundamental Difference
Code problems
Characteristics:
- Located in a function/class/module
- Solved with targeted refactoring
- Impact limited to the component
- Don't require design change
Examples:
- O(n²) algorithm that should be O(n)
- Unnecessary loop
- N+1 query
- Inefficient serialization
Architecture problems
Characteristics:
- Distributed throughout the system
- Require design change
- Systemic impact
- Not solved with local optimization
Examples:
- Synchronous communication where it should be asynchronous
- Excessive coupling between services
- Wrong database for the use case
- Missing cache in critical layer
Symptoms of Code Problems
1. Localized hotspot
Trace shows:
┌─ Service A: 50ms
│ └─ Function X: 45ms ← 90% of time here
├─ Service B: 30ms
└─ Service C: 20ms
Diagnosis:
Problem located in Function X
→ Solution: Optimize Function X
2. High CPU in one component
Metrics:
Service A: CPU 95%
Service B: CPU 15%
Service C: CPU 10%
Diagnosis:
Inefficient code in Service A
→ Profiler will identify the function
3. Specific slow query
Slow query log:
SELECT * FROM orders
WHERE user_id = ?
AND status = 'pending'
Time: 2.5s
EXPLAIN:
Seq Scan on orders (no index)
Diagnosis:
Code problem (missing index)
→ Solution: CREATE INDEX
Symptoms of Architecture Problems
1. Uniformly distributed latency
Trace shows:
┌─ Service A: 200ms
│ └─ Call to B: 180ms
├─ Service B: 180ms
│ └─ Call to C: 160ms
└─ Service C: 160ms
└─ Call to D: 140ms
Diagnosis:
Chain of synchronous calls
→ Solution: Rethink the design (async? aggregation?)
2. All services slow
Metrics under load:
Service A: p95 = 2s (normal: 100ms)
Service B: p95 = 1.8s (normal: 80ms)
Service C: p95 = 1.5s (normal: 50ms)
Diagnosis:
Systemic saturation, not localized
→ Capacity or design problem
3. Bottleneck moves when you optimize
Before:
DB is bottleneck → Add cache
After:
Cache is bottleneck → Increase cluster
After:
Network is bottleneck → ???
Diagnosis:
Architecture doesn't scale
→ Need to redesign, not optimize points
Diagnostic Framework
Step 1: End-to-end trace
Collect complete trace:
Request → Gateway → Service A → DB
→ Service B → Cache
→ Service C → External API
Analyze:
- Where is the time?
- Is it localized or distributed?
- Is there a pattern (always the same component)?
Step 2: Profile individual components
For each slow component:
- Run profiler (CPU, memory, I/O)
- Identify top functions
- Check if it's code or waiting
Example:
Service A profile shows:
- 80% of time in http.call() ← Waiting (architecture)
- 15% in json.parse() ← Code
- 5% in business logic ← Code
Step 3: Isolation test
Test component isolated:
- Remove dependencies (mock/stub)
- Apply same load
- Measure latency
If isolated is fast → Architecture problem
If isolated is slow → Code problem
Solutions by Problem Type
For code problems
Inefficient algorithm:
- Refactor to lower complexity
- Use appropriate data structure
Slow query:
- Add index
- Rewrite query
- Denormalize if necessary
Serialization:
- Change format (JSON → Protobuf)
- Reduce payload
Memory:
- Object pooling
- Stream processing
- Lazy loading
For architecture problems
Excessive synchronous calls:
- Introduce messaging (async)
- Aggregate calls (batch)
- Cache results
Coupling:
- Separate domains
- Event-driven architecture
- CQRS for read/write
Inadequate database:
- Polyglot persistence
- Read replicas
- Specialized database (time-series, graph)
Missing cache:
- Distributed cache
- Edge cache (CDN)
- Layered cache
Practical Examples
Example 1: Looks like code, is architecture
Symptom:
"Listing API slow (2s)"
Initial analysis:
Developer assumes: "Database query slow"
Investigation:
- Query takes 50ms ✓
- Service takes 2s total
- Trace shows: 20 calls to image service
Real diagnosis:
N+1 at service level
For each product, calls image service
Solution (architecture):
- Batch: fetch all images in one call
- Or: URL embedding in product
- Or: CDN with predictable URL
Example 2: Looks like architecture, is code
Symptom:
"Entire system slow under load"
Initial analysis:
Developer assumes: "We need more servers"
Investigation:
- Scaling doesn't help
- All pods high CPU
- Profile shows: regex in loop
Real diagnosis:
Catastrophic regex in email validation
Called thousands of times per request
Solution (code):
- Replace regex with simple validation
- Or: Compile regex once (singleton)
- Result: 10x more capacity with same hardware
Example 3: Both problems
Symptom:
"Checkout slow and unstable"
Investigation:
Problem 1 (code):
- Shipping calculation runs 3x (duplicated)
- Inefficient JSON parsing
Problem 2 (architecture):
- 8 synchronous calls to complete
- No fallback when external service slow
- No cache for rarely changed data
Solution:
1. Code fixes (quick):
- Remove duplication
- Optimize parsing
2. Architecture refactor (planned):
- Aggregate calls
- Add circuit breaker
- Implement cache
Decision Tree
The system is slow
│
▼
┌───────────────────────┐
│ Trace shows bottleneck│
│ in single component? │
└───────────┬───────────┘
│
┌────┴────┐
│ │
▼ ▼
Yes No
│ │
▼ ▼
┌─────────┐ ┌───────────────┐
│ Profile │ │ Latency is in │
│ the │ │ waiting (I/O) │
│component│ │ or CPU? │
└────┬────┘ └──────┬────────┘
│ │
▼ ┌────┴────┐
High CPU? │ │
│ Waiting CPU
┌────┴────┐ │ │
│ │ ▼ ▼
Yes No Architecture Code
│ │ (communication) (distributed)
▼ ▼
Code Architecture
(local) (I/O bound)
Conclusion
Distinguish between code and architecture:
- Trace first - understand the complete flow
- Profile the hotspots - CPU or waiting?
- Test isolated - is the component alone fast?
- Apply correct solution - don't use hammer on screw
The golden rule:
- Localized problem → Code optimization
- Distributed problem → Architecture review
- Both → Code first (faster), architecture later
It doesn't matter to have the most efficient code in the world if the architecture makes it wait.
This article is part of the series on the OCTOPUS Performance Engineering methodology.