Methodology9 min

Performance in Containers: optimizing containerized applications

Containers add abstraction layers that impact performance. Learn how to configure and optimize applications in Docker and Kubernetes.

Containers revolutionized how we deploy applications, but this abstraction has a cost. Inadequate configurations can transform a performant application into a bottleneck. This article explores how to optimize performance in containerized environments.

A container is not a virtual machine. Optimizing as if it were is the first mistake.

The Impact of Containers on Performance

Real overhead

Native application:     100 RPS baseline
Docker application:     95-98 RPS (2-5% overhead)
K8s application:        90-95 RPS (5-10% overhead)

The overhead is small, but configuration mistakes amplify it drastically:

Misconfigured container: 50-70 RPS
→ The problem isn't the container, it's the configuration

Sources of overhead

  1. Network: overlay networks, NAT, iptables
  2. Storage: copy-on-write, volumes
  3. CPU: cgroups, scheduling
  4. Memory: limits, OOM killer

Configuring Resources Correctly

CPU: requests vs limits

resources:
  requests:
    cpu: "500m"      # Guaranteed by scheduler
  limits:
    cpu: "1000m"     # Maximum allowed

Common mistakes:

# ❌ Limit too low - constant throttling
limits:
  cpu: "100m"

# ❌ No limit - can monopolize node
limits:
  cpu: null

# ❌ High Request = Limit - waste
requests:
  cpu: "2000m"
limits:
  cpu: "2000m"

Recommended configuration:

# ✅ Request based on average usage, limit for peaks
resources:
  requests:
    cpu: "250m"      # Observed average usage
  limits:
    cpu: "1000m"     # Headroom for peaks

CPU Throttling

When a container reaches its CPU limit, it suffers throttling:

No throttling: p99 latency = 50ms
With throttling: p99 latency = 500ms (10x worse)

How to detect:

# Container metrics
cat /sys/fs/cgroup/cpu/cpu.stat
# nr_throttled: number of times throttled
# throttled_time: total time in nanoseconds

Prometheus query:

rate(container_cpu_cfs_throttled_seconds_total[5m])

Memory: the delicate balance

resources:
  requests:
    memory: "512Mi"
  limits:
    memory: "1Gi"

The OOM Killer problem:

Memory used > Limit → OOM Kill → Pod restarts
→ Lost connections, in-flight requests fail

Safe configuration:

# ✅ 20-30% headroom above normal usage
resources:
  requests:
    memory: "512Mi"   # Average usage
  limits:
    memory: "768Mi"   # +50% headroom

JVM in Containers

Older JVMs don't respect container limits:

# Container with 1GB limit
# Old JVM sees 64GB from host machine
# Allocates 16GB heap → Instant OOM Kill

Solution:

# Use JVM 11+ which respects cgroups
FROM eclipse-temurin:17-jre

# Or configure explicitly
ENV JAVA_OPTS="-XX:MaxRAMPercentage=75.0"

Recommended JVM configuration:

env:
  - name: JAVA_OPTS
    value: >-
      -XX:MaxRAMPercentage=75.0
      -XX:InitialRAMPercentage=50.0
      -XX:+UseG1GC
      -XX:+UseContainerSupport

Network Optimization

Service mesh overhead

Without service mesh:  latency = 5ms
With Istio sidecar:    latency = 8-12ms (+60-140%)

When it's worth it:

  • Distributed observability
  • Mandatory mTLS
  • Complex traffic management

When to avoid:

  • Ultra-low latency critical
  • High volume of internal requests
  • Simplicity is priority

DNS lookup

Each request can do DNS lookup:

Request → DNS lookup (2-5ms) → Connection → Response

Optimization:

# Configure dnsPolicy
spec:
  dnsPolicy: ClusterFirst
  dnsConfig:
    options:
      - name: ndots
        value: "2"    # Reduces unnecessary lookups

Connection pooling

# ❌ New connection per request
# TCP handshake + TLS = 50-100ms per request

# ✅ Connection pool
# Reuses established connections

Pool configuration:

// Node.js with connection pool
const pool = new Pool({
  max: 20,
  idleTimeoutMillis: 30000,
  connectionTimeoutMillis: 2000
});

Storage Optimization

Storage types and performance

emptyDir (memory): ~500MB/s
emptyDir (disk):   ~100MB/s
hostPath:          ~100MB/s
PersistentVolume:  ~50-100MB/s (depends on provider)

Copy-on-Write overhead

Image layers use CoW:

# ❌ Many layers = lots of CoW
RUN apt-get update
RUN apt-get install -y package1
RUN apt-get install -y package2

# ✅ Fewer layers
RUN apt-get update && \
    apt-get install -y package1 package2 && \
    rm -rf /var/lib/apt/lists/*

Logs and performance

# ❌ Logs to stdout without limit
# Disk I/O grows indefinitely

# ✅ Limit container logs
apiVersion: v1
kind: Pod
spec:
  containers:
  - name: app
    # Configure log rotation in runtime

Docker daemon config:

{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "3"
  }
}

Kubernetes-Specific Optimizations

Pod Disruption Budget

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: app-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: my-app

Topology Spread

spec:
  topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: DoNotSchedule
    labelSelector:
      matchLabels:
        app: my-app

Readiness vs Liveness

# Liveness: is the app alive?
livenessProbe:
  httpGet:
    path: /health/live
    port: 8080
  initialDelaySeconds: 30
  periodSeconds: 10

# Readiness: can the app receive traffic?
readinessProbe:
  httpGet:
    path: /health/ready
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 5

Common mistake:

# ❌ Readiness probe too heavy
readinessProbe:
  httpGet:
    path: /health  # Checks DB, cache, external APIs
  periodSeconds: 1  # Every second!
# = DDoS yourself

# ✅ Light readiness with adequate frequency
readinessProbe:
  httpGet:
    path: /health/ready  # Basic check
  periodSeconds: 5

Graceful Shutdown

spec:
  terminationGracePeriodSeconds: 30
  containers:
  - name: app
    lifecycle:
      preStop:
        exec:
          command: ["/bin/sh", "-c", "sleep 5"]
// Application must handle SIGTERM
process.on('SIGTERM', async () => {
  console.log('Received SIGTERM, shutting down gracefully');
  await server.close();
  await db.close();
  process.exit(0);
});

Optimized Docker Image

Multi-stage build

# Build stage
FROM node:18 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production

# Runtime stage
FROM node:18-slim
WORKDIR /app
COPY --from=builder /app/node_modules ./node_modules
COPY . .
USER node
CMD ["node", "server.js"]

Base image

# ❌ Heavy image
FROM ubuntu:22.04  # ~77MB

# ✅ Optimized image
FROM alpine:3.18   # ~7MB

# ✅ Distroless (even smaller, more secure)
FROM gcr.io/distroless/nodejs:18  # ~40MB, no shell

Startup time

Large image (500MB):      pull = 30-60s
Optimized image (50MB):   pull = 3-6s

Performance Monitoring

Essential metrics

# Container-level
- container_cpu_usage_seconds_total
- container_memory_usage_bytes
- container_network_receive_bytes_total
- container_fs_reads_bytes_total

# Application-level
- http_request_duration_seconds
- http_requests_total
- process_resident_memory_bytes

Minimum dashboard

1. CPU Usage vs Request vs Limit
2. Memory Usage vs Request vs Limit
3. CPU Throttling
4. Pod Restarts
5. Network I/O
6. Disk I/O

Conclusion

Performance in containers depends on:

  1. Correct resources: requests and limits based on real data
  2. Avoid throttling: CPU throttling destroys latency
  3. Memory with headroom: OOM kills cause instability
  4. Optimized network: DNS, connection pools, conscious service mesh
  5. Lean images: fewer layers, smaller size, fast startup

Before blaming the container:

  1. Check for CPU throttling
  2. Confirm there are no OOM kills
  3. Analyze network metrics
  4. Compare with baseline outside the container

A container is a tool, not a villain. Poor container performance is usually poor performance amplified.

containersdockerkubernetesoptimization

Want to understand your platform's limits?

Contact us for a performance assessment.

Contact Us