Fundamentals8 min

Cloud elasticity: scaling on demand

Elasticity allows systems to grow and shrink automatically. Understand how it works, its benefits, and the necessary precautions.

One of the biggest promises of the cloud is elasticity: the ability to automatically increase or decrease resources based on demand. In theory, you pay only for what you use and never run out of capacity.

In practice, effective elasticity requires much more than enabling autoscaling. This article explores what elasticity is, how it works, and the precautions needed to truly leverage it.

Elasticity isn't magic. It's well-applied engineering.

What is Elasticity

Elasticity is a system's ability to automatically adapt its resources in response to changes in demand.

Unlike scalability (which is the ability to grow), elasticity also includes the ability to shrink — releasing resources when they're no longer needed.

Elasticity vs Scalability

Concept Definition
Scalability Ability to grow to meet more demand
Elasticity Ability to grow AND shrink dynamically

A system can be scalable without being elastic (grows but doesn't shrink automatically). Elasticity implies scalability, but the reverse isn't true.

How Elasticity Works

Basic components

  1. Monitoring metrics — CPU, memory, requests, latency
  2. Scaling rules — conditions that trigger increase or reduction
  3. Orchestrator — component that adds/removes instances
  4. Load balancer — distributes traffic among instances

Typical flow

Demand increases
    ↓
Metric exceeds threshold (e.g., CPU > 70%)
    ↓
Orchestrator starts new instances
    ↓
Load balancer includes new instances
    ↓
Load is redistributed
    ↓
Metrics normalize

The reverse process happens when demand drops.

Types of Elasticity

Reactive elasticity

Responds to changes after they happen. This is the most common model.

Advantages:

  • Simple to implement
  • Based on real metrics

Disadvantages:

  • Delay between demand and response
  • May not be fast enough for sudden spikes

Predictive elasticity

Uses historical data and machine learning to anticipate demand changes.

Advantages:

  • Prepares resources before the peak
  • Better user experience

Disadvantages:

  • More complex to implement
  • Depends on predictable patterns

Scheduled elasticity

Scales based on known schedules (e.g., more resources during business hours).

Advantages:

  • Predictable and controllable
  • Doesn't depend on real-time metrics

Disadvantages:

  • Doesn't respond to unexpected variations
  • May waste resources

Elasticity Challenges

1. Startup time (cold start)

New instances need time to start. If this time is long, elasticity loses effectiveness.

Mitigations:

  • Optimize application boot time
  • Maintain minimum pool of warm instances
  • Use lightweight containers

2. Application state

Stateful applications are difficult to scale elastically. Sessions, local cache, and in-memory state complicate adding/removing instances.

Mitigations:

  • Externalize state (Redis, database)
  • Stateless design
  • Sticky sessions (with caution)

3. Persistent connections

Databases, queues, and external services have connection limits. Scaling instances can exhaust these limits.

Mitigations:

  • Connection pooling
  • Per-instance limits
  • Managed services with their own scaling

4. Unexpected costs

Poorly configured elasticity can generate very high costs — especially in scaling loop scenarios or attacks.

Mitigations:

  • Maximum instance limits
  • Cost alerts
  • Rate limiting in the application

5. Thrashing (oscillation)

System keeps alternating between scaling up and down rapidly, generating instability.

Mitigations:

  • Cooldown periods between scaling actions
  • Thresholds with hysteresis (different values for up and down)
  • Scale in larger steps

Metrics for Elasticity

Input metrics (when to scale up)

  • CPU utilization — simple but can be misleading
  • Request rate — more direct for web applications
  • Queue depth — excellent for workers
  • Response time — scales based on user experience
  • Custom metrics — business-specific

Output metrics (when to scale down)

Generally the same metrics, but with more conservative thresholds to avoid thrashing.

Elasticity in Practice

Example: E-commerce on Black Friday

Normal days: 10 instances
Pre-Black Friday: scheduled scaling to 50 instances
During: reactive elasticity allows reaching 200 instances
Post-peak: gradually returns to 10 instances

Example: B2B SaaS

Night/early morning: 2 instances (minimum)
Business hours: 5-10 instances (reactive elasticity)
End of month (closing): peaks of 15-20 instances

Best Practices

  1. Set maximum limits — protect yourself from uncontrolled costs
  2. Test elasticity — simulate peaks and validate behavior
  3. Monitor scaling time — know how long it takes to react
  4. Use multiple metrics — CPU alone rarely tells the whole story
  5. Plan the minimum — how many instances do you need even without load?
  6. Consider reservations — reserved instances for baseline, on-demand for peaks

Conclusion

Elasticity is a powerful cloud capability, but it's neither automatic nor free. It requires:

  • Prepared applications (stateless, fast boot)
  • Careful configuration of thresholds and limits
  • Continuous monitoring of behavior
  • Regular testing of scaling scenarios

When well implemented, elasticity transforms fixed costs into variable ones and ensures your system responds to real demand — without waste and without surprises.

The cloud promises elasticity. It's up to you to ensure your system is ready to take advantage of it.

elasticitycloudautoscalinginfrastructure

Want to understand your platform's limits?

Contact us for a performance assessment.

Contact Us