Dec 14, 2025

A/B Testing Agent Prompts

H
Hyperfold Team
AgentsTesting

Experiment Design

A/B testing agent prompts helps you optimize for conversion, satisfaction, and revenue. Before running experiments, define clear hypotheses and metrics.

Key considerations:

  • Test one variable at a time for clear causation
  • Define primary metric before starting
  • Run for statistical significance (typically 1000+ sessions)
  • Consider guardrail metrics to catch negative effects

Creating Variants

Define your experiment variants in configuration:

# experiment-config.yaml
experiment:
name: negotiation-tone-test
description: Test friendly vs professional negotiation tone

variants:
  - id: control
    weight: 50
    prompt: |
      You are a professional sales agent. Be courteous and business-like.
      Focus on value and product benefits.

  - id: treatment
    weight: 50
    prompt: |
      You are a friendly shopping assistant. Be warm and conversational.
      Build rapport while helping customers find what they need.

metrics:
  primary: conversion_rate
  secondary:
    - average_order_value
    - customer_satisfaction
    - session_duration

duration: 14d
min_sessions: 1000

Traffic Splitting

Implement traffic splitting in your agent:

// Traffic splitting implementation
import { getExperimentVariant } from '@hyperfold/experiments';

@OnACPEvent('session.start')
async handleSessionStart(session: Session) {
// Assign variant based on experiment config
const variant = await getExperimentVariant(
  'negotiation-tone-test',
  session.id
);

// Store variant assignment for consistent experience
await this.state.set(`session:${session.id}:variant`, variant.id);

// Use variant's system prompt
this.systemPrompt = variant.prompt;

// Track assignment
await trackEvent('experiment.assigned', {
  experiment: 'negotiation-tone-test',
  variant: variant.id,
  session_id: session.id,
});
}

Measuring Results

Track experiment metrics:

# View experiment metrics
$ hyperfold experiments metrics negotiation-tone-test

EXPERIMENT: negotiation-tone-test
Status: Running (Day 7 of 14)

VARIANT         SESSIONS  CONVERSIONS  CONV RATE  AOV      SIGNIFICANCE
control         2,847     912          32.0%      $156.20  -
treatment       2,891     1,012        35.0%      $148.40  94.2%

Primary Metric: conversion_rate
Treatment shows +3.0% lift (94.2% confidence)
Need 95% confidence to declare winner

Statistical Analysis

Hyperfold uses Bayesian analysis to determine statistical significance:

# Get detailed analysis
$ hyperfold experiments analyze negotiation-tone-test --detailed

STATISTICAL ANALYSIS
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Conversion Rate:
Control:     32.0% (95% CI: 30.3% - 33.7%)
Treatment:   35.0% (95% CI: 33.3% - 36.7%)

Lift:        +9.4% relative improvement
P-value:     0.058

Recommendation: Continue running - approaching significance

# When ready to conclude
$ hyperfold experiments conclude negotiation-tone-test --winner=treatment