Billing & Cost Analysis
Track spending, set budgets, and optimize your Hyperfold costs.
Overview
The billing dashboard provides complete visibility into your Hyperfold spending. Track costs by component, set budget alerts, and get AI-powered optimization recommendations.
Costs are updated hourly. For real-time spend tracking, set up budget alerts with low thresholds.
Cost Breakdown
# View current billing summary
$ hyperfold billing summary
BILLING SUMMARY: January 2025
PERIOD: Jan 1 - Jan 20, 2025
TOTAL SPEND $4,892.50
├─ LLM Inference $2,450.00 (50%)
├─ Agent Compute $1,200.00 (25%)
├─ Storage & Database $480.00 (10%)
├─ Network & Bandwidth $320.00 (7%)
├─ Integrations $242.50 (5%)
└─ Support & Services $200.00 (4%)
PROJECTED MONTH-END $7,645.00
BUDGET $8,000.00
STATUS ✓ On track
# Detailed breakdown
$ hyperfold billing breakdown --period=mtd
COST BREAKDOWN (Month to Date)
LLM INFERENCE $2,450.00
OpenAI GPT-4-Turbo 4.2M tokens $1,680.00
OpenAI GPT-4o 1.8M tokens $540.00
Embeddings 12M tokens $230.00
AGENT COMPUTE $1,200.00
sales-negotiator 142 hrs $710.00
fulfillment-agent 68 hrs $340.00
recommender-agent 30 hrs $150.00
STORAGE $480.00
Vector Database 45 GB $225.00
Document Storage 120 GB $180.00
Logs & Analytics 50 GB $75.00
INTEGRATIONS $242.50
Shopify API calls 45,000 $90.00
Stripe API calls 12,000 $72.00
ShipStation 8,000 $80.50
Cost Components
| Component | What's Included |
|---|---|
| LLM Inference | GPT-4 tokens, embeddings, reasoning |
| Agent Compute | Container runtime, CPU, memory |
| Storage | Vector DB, documents, logs |
| Integrations | External API calls, webhooks |
LLM Costs
LLM inference is typically the largest cost component. Analyze token usage to optimize spending:
# Detailed LLM usage analysis
$ hyperfold billing llm --since=7d
LLM USAGE (7 days)
TOTAL TOKENS: 8.4M COST: $1,120.00
BY MODEL
GPT-4-Turbo 5.2M tokens $832.00 (74%)
GPT-4o 2.4M tokens $216.00 (19%)
text-embedding-3-large 0.8M tokens $72.00 (6%)
BY AGENT
sales-negotiator 6.1M tokens $890.00 (79%)
recommender-agent 1.8M tokens $180.00 (16%)
fulfillment-agent 0.5M tokens $50.00 (4%)
BY OPERATION
Negotiation reasoning 4.2M tokens $672.00
Product search 1.5M tokens $135.00
Quote generation 1.2M tokens $168.00
EFFICIENCY METRICS
Avg tokens/session: 847
Avg tokens/conversion: 2,541
Cost/conversion: $0.34
Sessions/dollar: 3.8
# Per-session LLM costs
$ hyperfold billing llm --session=sess_abc123
SESSION: sess_abc123
Duration: 32.5s
Tokens: 1,247
Cost: $0.20
Outcome: conversion ($155.00)
ROI: 775x
Budget Alerts
# Configure budget alerts
$ hyperfold billing budget set \
--monthly=8000 \
--alert-threshold=80
Budget configured:
Monthly limit: $8,000
Alert at: 80% ($6,400)
Current spend: $4,892.50 (61%)
# Set component-specific budgets
$ hyperfold billing budget set \
--component=llm \
--monthly=3000 \
--alert-threshold=90
# View budget status
$ hyperfold billing budget status
BUDGET STATUS
COMPONENT BUDGET SPENT REMAINING STATUS
Overall $8,000 $4,893 $3,107 ✓ 61%
LLM $3,000 $2,450 $550 ⚠ 82%
Compute $2,000 $1,200 $800 ✓ 60%
# Budget alert notification settings
$ hyperfold billing budget alerts \
--channels="slack:#finance,email:billing@company.com" \
--frequency=daily
Cost Optimization
# Get cost optimization recommendations
$ hyperfold billing optimize
COST OPTIMIZATION RECOMMENDATIONS
1. SWITCH TO GPT-4O-MINI FOR SIMPLE TASKS
Estimated savings: $420/month (17%)
2. ENABLE RESPONSE CACHING
Estimated savings: $180/month (7%)
3. OPTIMIZE AGENT SCALING
Estimated savings: $150/month (6%)
4. REDUCE EMBEDDING DIMENSIONS
Estimated savings: $60/month (2%)
TOTAL POTENTIAL SAVINGS: $810/month (33%)
# Compare costs across periods
$ hyperfold billing compare --period1=dec --period2=jan
COST COMPARISON: December vs January
COMPONENT DECEMBER JANUARY CHANGE
LLM $2,100 $2,450 +$350 (+17%)
Compute $980 $1,200 +$220 (+22%)
Total $3,500 $4,130 +$630 (+18%)
Optimization Strategies
Model Selection — Use smaller, faster models for simple tasks. Route complex reasoning to GPT-4 only when needed.
Response Caching — Cache responses for semantically similar queries. Reduces token usage without affecting quality.
Prompt Optimization — Shorter, more focused prompts use fewer tokens. Review verbose system prompts for trimming opportunities.
Smart Scaling — Reduce minimum instances during off-peak hours. Use scheduled scaling for predictable traffic patterns.
For infrastructure scaling configuration, see Auto-Scaling.