Dec 14, 2025
A/B Testing Agent Prompts
H
Hyperfold TeamAgentsTesting
Experiment Design
A/B testing agent prompts helps you optimize for conversion, satisfaction, and revenue. Before running experiments, define clear hypotheses and metrics.
Key considerations:
- Test one variable at a time for clear causation
- Define primary metric before starting
- Run for statistical significance (typically 1000+ sessions)
- Consider guardrail metrics to catch negative effects
Creating Variants
Define your experiment variants in configuration:
yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# experiment-config.yamlexperiment: name: negotiation-tone-test description: Test friendly vs professional negotiation tone variants: - id: control weight: 50 prompt: | You are a professional sales agent. Be courteous and business-like. Focus on value and product benefits. - id: treatment weight: 50 prompt: | You are a friendly shopping assistant. Be warm and conversational. Build rapport while helping customers find what they need. metrics: primary: conversion_rate secondary: - average_order_value - customer_satisfaction - session_duration duration: 14d min_sessions: 1000Traffic Splitting
Implement traffic splitting in your agent:
typescript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// Traffic splitting implementationimport { getExperimentVariant } from '@hyperfold/experiments'; @OnACPEvent('session.start')async handleSessionStart(session: Session) { // Assign variant based on experiment config const variant = await getExperimentVariant( 'negotiation-tone-test', session.id ); // Store variant assignment for consistent experience await this.state.set(`session:${session.id}:variant`, variant.id); // Use variant's system prompt this.systemPrompt = variant.prompt; // Track assignment await trackEvent('experiment.assigned', { experiment: 'negotiation-tone-test', variant: variant.id, session_id: session.id, });}Ensure consistent variant assignment per session. A customer should see the same variant throughout their entire session.
Measuring Results
Track experiment metrics:
bash
1
2
3
4
5
6
7
8
9
10
11
12
13
# View experiment metrics$ hyperfold experiments metrics negotiation-tone-test EXPERIMENT: negotiation-tone-testStatus: Running (Day 7 of 14) VARIANT SESSIONS CONVERSIONS CONV RATE AOV SIGNIFICANCEcontrol 2,847 912 32.0% $156.20 -treatment 2,891 1,012 35.0% $148.40 94.2% Primary Metric: conversion_rate Treatment shows +3.0% lift (94.2% confidence) Need 95% confidence to declare winnerStatistical Analysis
Hyperfold uses Bayesian analysis to determine statistical significance:
bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# Get detailed analysis$ hyperfold experiments analyze negotiation-tone-test --detailed STATISTICAL ANALYSIS━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Conversion Rate: Control: 32.0% (95% CI: 30.3% - 33.7%) Treatment: 35.0% (95% CI: 33.3% - 36.7%) Lift: +9.4% relative improvement P-value: 0.058 Recommendation: Continue running - approaching significance # When ready to conclude$ hyperfold experiments conclude negotiation-tone-test --winner=treatment