Independent buyer reference. Not affiliated with Gong, Clari, ZoomInfo, 11x, Artisan, Regie.ai, Vapi, Retell, Bland, or any AI sales vendor. Prices verified May 2026; confirm before purchase. Legal overview | FAQ
Voice AI InfrastructurePublic pricing; stacked components

Vapi Pricing in 2026: Why $0.05 Per Minute Is Never What You Pay

Vapi publishes its $0.05 per minute platform rate publicly. That number is the orchestration layer alone. Once Speech-to-Text, LLM tokens, Text-to-Speech, and telephony are stacked, a typical sales call lands at $0.25 to $0.33 per minute all-in. Here is the honest per-component math.

Last verified May 2026. Provider pricing changes frequently; always confirm before purchase.

$0.05/min

Vapi platform layer

$0.25-$0.33/min

Real all-in (typical)

$1.40-$1.65

5-min call (mid-stack)

Negotiable down to $0.03

100K-min/mo volume

§The Five-Layer Voice AI Cost Stack

Every Vapi minute consumes five distinct providers, each priced independently. Add them to get your real per-minute cost. The reason Vapi publishes the platform layer in isolation is that this is the only layer Vapi owns; the rest is pass-through to providers whose pricing Vapi cannot guarantee.

LAYER 1

Vapi platform (orchestration)

$0.05/minute

Vapi's owned layer: audio routing, conversation state machine, function-calling infrastructure, real-time interrupt handling, voice activity detection (VAD) tuning, recording and transcription delivery to your application. This is what makes Vapi worth using over building your own pipeline. Charged at $0.05 per minute on standard tier, dropping at high volume.

LAYER 2

Speech-to-Text (STT)

$0.005-$0.012/minute

Deepgram Nova-2 at $0.0043 per minute is the modal choice for sales; AssemblyAI at $0.009 per minute and OpenAI Whisper-hosted at $0.006 per minute are alternatives. Voice quality (latency, accent handling, accuracy on industry terms) matters more than the absolute price for sales use cases.

LAYER 3

LLM (the brain)

$0.04-$0.18/minute

The biggest line item by far. GPT-4o-mini at typical sales conversation token volume costs $0.04 to $0.08 per minute. GPT-4o full costs $0.10 to $0.18 per minute. Claude Sonnet 3.5/3.6 lands in a similar range to GPT-4o. Gemini Flash 1.5 is the budget option at $0.03 to $0.06 per minute, with quality trade-offs on objection handling. The token volume per minute of voice conversation averages 700 to 1,400 input plus 300 to 600 output tokens.

LAYER 4

Text-to-Speech (TTS)

$0.04-$0.18/minute

The second-biggest line item, and the one with the most quality-versus-cost variability. ElevenLabs Multilingual v2 at $0.12 to $0.18 per minute is the realism leader; ElevenLabs Turbo v2.5 at $0.06 to $0.09 is the latency-optimised mid-tier; Cartesia Sonic at $0.04 to $0.07 per minute is the latency leader; PlayHT at $0.05 to $0.08 per minute is a workhorse mid-tier; OpenAI TTS at $0.03 to $0.06 per minute is the budget choice at lower realism.

LAYER 5

Telephony (calling infrastructure)

$0.01-$0.03/minute

Twilio Programmable Voice at $0.014 per minute (inbound and outbound, US, domestic numbers) is the modal choice. Telnyx at $0.0035 to $0.005 per minute is the cost leader. Plivo at $0.0080 per minute is mid-tier. SIP trunking pricing applies for high-volume operators on Telnyx or Twilio Elastic SIP. International numbers carry significantly higher per-minute pass-through.

§Three Reference Stacks: Budget, Mid, Premium

The same Vapi-built sales agent can run at very different costs depending on component choice. These three reference stacks bracket the realistic 2026 sales-call pricing band:

ComponentBudgetMidPremium
Platform (Vapi)$0.05$0.05$0.05
STTDeepgram $0.0043Deepgram $0.0043Deepgram $0.0043
LLMGemini Flash $0.04GPT-4o-mini $0.06GPT-4o full $0.15
TTSOpenAI $0.04ElevenLabs Turbo $0.08ElevenLabs Multi v2 $0.15
TelephonyTelnyx $0.005Twilio $0.014Twilio $0.014
Per-minute total$0.139$0.208$0.363
Per 5-min call$0.70$1.04$1.82

The honest reading: The headline $0.25-$0.33 per minute Vapi all-in figure comes from the mid-stack at moderate-realism TTS. Pushing TTS down to Cartesia or OpenAI compresses the stack below $0.20 per minute at moderate sales-call quality loss. Pushing TTS up to ElevenLabs Multilingual v2 (which is what most marketing demos use) lands above $0.35 per minute. Latency-budget and conversational-realism budget often pull in opposite directions.

§Monthly Volume Math for a Sales Team

For a sales team running Vapi as an outbound AI SDR or inbound qualifier, the monthly cost depends on minutes-per-call times calls-per-day times working-days. A typical mid-volume deployment looks like:

ScenarioCalls/dayMin/callMonthly minMonthly cost (mid)
Low: 1 outbound rep test5033,000$625
Mid: 5-rep team outbound250315,000$3,125
High: inbound qualifier 24x7400432,000$6,650
Scale: 50K calls/mo, 5-min avg2,5005250,000$45K-$55K

For comparison, an 11x Alice subscription delivering similar outbound qualification volume runs $5,000 per month flat. The Vapi mid-volume 5-rep deployment runs $3,125 per month. The break-even point favours Vapi at lower volume per agent and favours 11x at the very high volume per Alice instance (where Alice's flat pricing structurally undercuts variable-cost Vapi).

§Hidden Costs Most Vapi Buyers Miss

Engineering time (the dominant hidden cost)

A production-grade Vapi sales agent typically takes 4 to 12 engineering weeks to build, tune, and harden against edge cases. At a $200,000 loaded engineer salary, that is $15,000 to $46,000 of build cost. This is the line most build-vs-buy analyses underestimate by 3x.

Voice cloning costs

If you use ElevenLabs Voice Lab for a custom voice clone (a common branding choice for higher-end deployments), expect $99 to $1,320 per month subscription on top of per-minute TTS. Voice Clone API access is gated behind the Enterprise tier for some commercial use cases.

Phone number registration (TCPA, STIR/SHAKEN)

US outbound numbers require carrier registration to avoid being flagged as spam (Twilio's A2P 10DLC registration costs $4 setup plus $1.50/month per phone number, plus throughput tier fees). Without registration, AI voice calls are heavily filtered by Verizon, AT&T, and T-Mobile spam filtering algorithms.

Call recording storage

Vapi delivers recordings to your application; storing recordings beyond 30 days at scale (compliance retention, training data) typically lands at $0.02 to $0.04 per call on S3 Standard or equivalent. At 50,000 calls per month this is $1,000 to $2,000 per month.

§FAQ

Can I get Vapi for less than $0.20 per minute all-in?
Yes, with budget choices: Gemini Flash 1.5 plus Cartesia Sonic TTS plus Telnyx telephony plus Deepgram STT lands at roughly $0.14 per minute all-in. The trade-off is sales-call quality: Gemini Flash handles less objection nuance, Cartesia Sonic sounds slightly more synthetic than ElevenLabs. For inbound qualification where speed and cost outweigh charm, the budget stack is fine; for outbound cold calls where charm matters, the mid stack is more defensible.
Is there a Vapi free tier?
Vapi offers a small free credit (typically $10 to $25 on signup) for development and testing. Production traffic moves to pay-as-you-go immediately. There is no enduring free monthly allowance for production usage.
Does Vapi charge for inbound versus outbound differently?
Vapi's platform fee is identical at $0.05 per minute for inbound and outbound. The telephony pass-through differs: outbound calls incur Twilio or Telnyx per-minute outbound charges; inbound calls incur per-number monthly fees plus inbound per-minute. Inbound is typically slightly cheaper per minute but adds the monthly per-number fixed cost.
What is the latency I should budget for?
Realistic 2026 latency for the Vapi standard stack (Deepgram STT plus GPT-4o-mini plus ElevenLabs Turbo TTS) is 800ms to 1.2 seconds first-token to first-audio-out for the model's reply, dropping to 500-800ms with Cartesia Sonic on TTS. Sub-500ms is achievable with local-hosted models on dedicated hardware but pushes the build complexity meaningfully.
Can I use Vapi for HIPAA-compliant healthcare outbound?
Vapi offers a HIPAA-compliant tier with BAA execution and PHI-handling controls; you also need to ensure the LLM, STT, and TTS providers in your stack are HIPAA-eligible. OpenAI, Anthropic, Google, Deepgram, and ElevenLabs all offer HIPAA-eligible enterprise tiers; default pay-as-you-go API endpoints typically are not HIPAA-compliant. See the full HIPAA implications on the dedicated page.
Does Vapi support warm transfer to a human?
Yes. Vapi supports SIP transfer, REFER, and bridging patterns for handoff to a human AE when the AI cannot handle the conversation. Implementation involves wiring Vapi's function-calling to your CRM or call-routing system. Most production deployments include warm-transfer logic for buying-signal moments.

Updated 2026-05-11