Auto-Scaling
Automatically scale agents based on traffic and performance metrics.
Overview
Auto-scaling automatically adjusts the number of agent instances based on demand. Scale up during traffic spikes and down during quiet periods to optimize cost and performance.
Hyperfold uses predictive scaling combined with reactive metrics to minimize latency spikes during traffic increases.
Configuration
bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# Configure auto-scaling for an agent$ hyperfold agent scale sales-negotiator \ --min=2 \ --max=50 \ --target-concurrency=100 Scaling configuration updated: Agent: sales-negotiator Min Instances: 2 Max Instances: 50 Target Concurrency: 100 requests/instance # View current scaling status$ hyperfold agent scale sales-negotiator --status SCALING STATUS: sales-negotiator━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ CONFIGURATION Min Instances: 2 Max Instances: 50 Target Concurrency: 100 CURRENT STATE Active Instances: 8 Current Load: 720 req/min Avg Concurrency: 90 req/instance CPU Utilization: 65% Memory Usage: 412MB/512MB SCALING ACTIVITY (last hour) 10:15 Scale up 6 → 8 instances (traffic spike) 09:45 Scale up 4 → 6 instances (approaching target) 09:00 Stable 4 instances 08:30 Scale down 6 → 4 instances (traffic decrease)Advanced Configuration
bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# Advanced scaling configuration$ hyperfold agent scale sales-negotiator \ --min=2 \ --max=100 \ --target-concurrency=80 \ --scale-up-threshold=0.7 \ --scale-down-threshold=0.3 \ --cooldown=60 # Scale based on multiple metrics$ hyperfold agent scale sales-negotiator \ --metrics='[ {"metric": "concurrency", "target": 80}, {"metric": "cpu", "target": 70}, {"metric": "latency_p95", "max": 500} ]' # Scheduled scaling for predictable traffic$ hyperfold agent scale sales-negotiator \ --schedule='[ {"cron": "0 9 * * 1-5", "min": 10, "max": 100}, {"cron": "0 18 * * 1-5", "min": 5, "max": 50}, {"cron": "0 0 * * 6-7", "min": 2, "max": 20} ]' Scheduled scaling configured: Weekdays 9 AM: 10-100 instances (business hours) Weekdays 6 PM: 5-50 instances (evening) Weekends: 2-20 instances (low traffic) # Configure scaling per agent type$ hyperfold agent scale fulfillment-agent \ --min=1 \ --max=20 \ --target-concurrency=50 \ --scale-up-rate=2 \ --scale-down-rate=1Configuration Options
| Option | Description |
|---|---|
--min | Minimum instances always running |
--max | Maximum instances allowed |
--target-concurrency | Target requests per instance |
--cooldown | Seconds between scaling decisions |
--schedule | Time-based scaling overrides |
Scaling Strategies
bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
# Scaling strategies comparison TARGET CONCURRENCY (recommended)━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━Scales based on requests per instance.Best for: API-heavy workloads, negotiation agents $ hyperfold agent scale myagent \ --strategy=target_concurrency \ --target-concurrency=100 How it works: desired = ceil(current_requests / target_concurrency) If 800 requests and target=100: need 8 instances CPU UTILIZATION━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━Scales based on CPU usage percentage.Best for: Compute-intensive workloads $ hyperfold agent scale myagent \ --strategy=cpu \ --target-cpu=70 How it works: Scale up when avg CPU > 70% Scale down when avg CPU < 40% QUEUE DEPTH━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━Scales based on pending requests.Best for: Batch processing, async workflows $ hyperfold agent scale myagent \ --strategy=queue_depth \ --target-queue=10 How it works: desired = ceil(queue_size / target_queue) Scale up quickly when queue grows HYBRID━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━Combines multiple metrics with weights.Best for: Complex workloads $ hyperfold agent scale myagent \ --strategy=hybrid \ --metrics='[ {"metric": "concurrency", "target": 80, "weight": 0.5}, {"metric": "cpu", "target": 70, "weight": 0.3}, {"metric": "memory", "target": 80, "weight": 0.2} ]'Monitoring
bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# Monitor scaling activity$ hyperfold agent scale-history sales-negotiator --since=24h SCALING HISTORY: sales-negotiator (24 hours)━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ TIME ACTION FROM TO REASONJan 20 10:15 Scale Up 6 8 Concurrency at 95%Jan 20 09:45 Scale Up 4 6 Concurrency at 92%Jan 20 08:30 Scale Down 6 4 Concurrency at 25%Jan 20 02:00 Scale Down 8 6 Scheduled (night)Jan 19 22:00 Scale Down 10 8 Traffic decreaseJan 19 18:30 Scale Up 8 10 Traffic spikeJan 19 14:00 Scale Up 6 8 Concurrency at 88% SUMMARY Scale Up Events: 4 Scale Down Events: 3 Avg Instances: 6.2 Peak Instances: 10 Min Instances: 4 # View real-time scaling metrics$ hyperfold metrics agents sales-negotiator --live LIVE METRICS: sales-negotiator━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ [▓▓▓▓▓▓▓▓░░] 8/10 instances Requests: 842/min [▓▓▓▓▓▓▓▓▓░]Concurrency: 95/100 [▓▓▓▓▓▓▓▓▓▓]CPU: 68% [▓▓▓▓▓▓▓░░░]Memory: 420MB [▓▓▓▓▓▓▓▓░░]Latency p95: 340ms [▓▓▓▓░░░░░░] Scaling: ⚡ Scale up likely in ~2 minutes (Press q to quit, r to refresh)Optimization
Analyze scaling patterns to optimize configuration:
bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# Optimize scaling configuration$ hyperfold agent scale-analyze sales-negotiator --since=7d SCALING ANALYSIS: sales-negotiator (7 days)━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ CURRENT CONFIGURATION Min: 2, Max: 50, Target Concurrency: 100 OBSERVATIONS ⚠ Over-provisioned during off-hours Avg utilization 2-6 AM: 15% Recommendation: Reduce min to 1 during night ⚠ Scale-up lag during morning spike Avg scale-up time: 3.2 minutes Peak missed capacity: 12% Recommendation: Increase min to 4 at 8 AM ✓ Target concurrency well-tuned Avg utilization: 72% Recommendation: Keep at 100 ⚠ Frequent scale oscillation 2-4 PM Scale events: 8 in 2 hours Recommendation: Increase cooldown to 90s RECOMMENDED CONFIGURATION --min=1 --max=50 --target-concurrency=100 --cooldown=90 --schedule='[ {"cron": "0 8 * * 1-5", "min": 4}, {"cron": "0 20 * * 1-5", "min": 2}, {"cron": "0 2 * * *", "min": 1} ]' ESTIMATED SAVINGS Current monthly cost: $2,450 Optimized monthly cost: $1,890 Savings: $560 (23%) Apply recommendations? [Y/n]Track scaling costs with Billing & Cost Analysis.