Auto-Scaling
Automatically scale agents based on traffic and performance metrics.
Overview
Auto-scaling automatically adjusts the number of agent instances based on demand. Scale up during traffic spikes and down during quiet periods to optimize cost and performance.
Hyperfold uses predictive scaling combined with reactive metrics to minimize latency spikes during traffic increases.
Configuration
# Configure auto-scaling for an agent
$ hyperfold agent scale sales-negotiator \
--min=2 \
--max=50 \
--target-concurrency=100
Scaling configuration updated:
Agent: sales-negotiator
Min Instances: 2
Max Instances: 50
Target Concurrency: 100 requests/instance
# View current scaling status
$ hyperfold agent scale sales-negotiator --status
SCALING STATUS: sales-negotiator
CONFIGURATION
Min Instances: 2
Max Instances: 50
Target Concurrency: 100
CURRENT STATE
Active Instances: 8
Current Load: 720 req/min
Avg Concurrency: 90 req/instance
CPU Utilization: 65%
Memory Usage: 412MB/512MB
Advanced Configuration
# Advanced scaling configuration
$ hyperfold agent scale sales-negotiator \
--min=2 \
--max=100 \
--target-concurrency=80 \
--scale-up-threshold=0.7 \
--scale-down-threshold=0.3 \
--cooldown=60
# Scale based on multiple metrics
$ hyperfold agent scale sales-negotiator \
--metrics='[
{"metric": "concurrency", "target": 80},
{"metric": "cpu", "target": 70},
{"metric": "latency_p95", "max": 500}
]'
# Scheduled scaling for predictable traffic
$ hyperfold agent scale sales-negotiator \
--schedule='[
{"cron": "0 9 * * 1-5", "min": 10, "max": 100},
{"cron": "0 18 * * 1-5", "min": 5, "max": 50},
{"cron": "0 0 * * 6-7", "min": 2, "max": 20}
]'
Scheduled scaling configured:
Weekdays 9 AM: 10-100 instances (business hours)
Weekdays 6 PM: 5-50 instances (evening)
Weekends: 2-20 instances (low traffic)
Configuration Options
| Option | Description |
|---|---|
--min | Minimum instances always running |
--max | Maximum instances allowed |
--target-concurrency | Target requests per instance |
--cooldown | Seconds between scaling decisions |
--schedule | Time-based scaling overrides |
Scaling Strategies
# TARGET CONCURRENCY (recommended) - Scales based on requests per instance.
# Best for: API-heavy workloads, negotiation agents
$ hyperfold agent scale myagent \
--strategy=target_concurrency \
--target-concurrency=100
# CPU UTILIZATION - Scales based on CPU usage percentage.
# Best for: Compute-intensive workloads
$ hyperfold agent scale myagent \
--strategy=cpu \
--target-cpu=70
# QUEUE DEPTH - Scales based on pending requests.
# Best for: Batch processing, async workflows
$ hyperfold agent scale myagent \
--strategy=queue_depth \
--target-queue=10
# HYBRID - Combines multiple metrics with weights.
# Best for: Complex workloads
$ hyperfold agent scale myagent \
--strategy=hybrid \
--metrics='[
{"metric": "concurrency", "target": 80, "weight": 0.5},
{"metric": "cpu", "target": 70, "weight": 0.3},
{"metric": "memory", "target": 80, "weight": 0.2}
]'
Monitoring
# Monitor scaling activity
$ hyperfold agent scale-history sales-negotiator --since=24h
# View real-time scaling metrics
$ hyperfold metrics agents sales-negotiator --live
Optimization
Analyze scaling patterns to optimize configuration:
# Optimize scaling configuration
$ hyperfold agent scale-analyze sales-negotiator --since=7d
SCALING ANALYSIS: sales-negotiator (7 days)
CURRENT CONFIGURATION
Min: 2, Max: 50, Target Concurrency: 100
OBSERVATIONS
⚠ Over-provisioned during off-hours
⚠ Scale-up lag during morning spike
✓ Target concurrency well-tuned
⚠ Frequent scale oscillation 2-4 PM
RECOMMENDED CONFIGURATION
--min=1
--max=50
--target-concurrency=100
--cooldown=90
--schedule='[
{"cron": "0 8 * * 1-5", "min": 4},
{"cron": "0 20 * * 1-5", "min": 2},
{"cron": "0 2 * * *", "min": 1}
]'
ESTIMATED SAVINGS
Current monthly cost: $2,450
Optimized monthly cost: $1,890
Savings: $560 (23%)
Apply recommendations? [Y/n]
Track scaling costs with Billing & Cost Analysis.