Why is Vapi's $0.05 per minute not the real price?

Vapi's $0.05 per minute is the platform fee only, the bare orchestration layer that routes audio between STT, LLM, and TTS providers. You also pay for the STT (Deepgram, AssemblyAI), the LLM (OpenAI, Anthropic, Google), the TTS voice (ElevenLabs, PlayHT, Cartesia), and the telephony minutes (Twilio, Telnyx). All-in for a typical sales call: $0.25 to $0.33 per minute.

What does a 5-minute Vapi sales call cost?

A typical 5-minute Vapi outbound sales call at the mid-range stack (Deepgram Nova STT, GPT-4o-mini LLM, ElevenLabs Turbo TTS, Twilio telephony) costs roughly $1.40 to $1.65 all-in. The same call on a premium stack (GPT-4o full, ElevenLabs Multilingual v2) lands at $2.10 to $2.50. The same call on a budget stack (Whisper local, GPT-4o-mini, Cartesia Sonic, Telnyx) lands at $0.95 to $1.15.

Are there monthly subscription fees for Vapi?

Vapi's standard pricing is fully pay-as-you-go with no monthly subscription. Enterprise tier (custom) adds reserved capacity, volume discounting, and SLA commitments for high-volume operators (typically 100,000+ minutes per month). There is no entry-level monthly minimum.

Can I bring my own LLM API key?

Yes. Vapi supports bring-your-own-key (BYOK) for OpenAI, Anthropic, Google Gemini, Groq, and most major LLM providers. BYOK means you pay LLM costs directly to the provider rather than through Vapi's pass-through, which can save 5 to 15 percent on token cost depending on negotiated rates. STT and TTS also support BYOK for major providers.

What is Vapi's volume discount?

Vapi's published volume tiers reduce the platform fee at higher monthly minute totals: 0 to 100,000 minutes at $0.05/min, 100,001 to 500,000 minutes negotiated, 500,001+ minutes enterprise pricing typically landing at $0.03 to $0.04/min platform-only. Underlying provider costs (STT, LLM, TTS, telephony) do not get Vapi volume discounts; those are billed at provider rates.

Is Vapi cheaper than Retell or Bland?

Not on standard sales deployments. Vapi all-in at $0.25 to $0.33/min is typically more expensive than Retell's $0.07 to $0.20/min bundled pricing because Retell includes STT, LLM, and TTS in the per-minute rate. Vapi is the cheaper choice when you want maximum component flexibility (BYO models, custom telephony pipelines) and at very high volume where unbundled negotiation beats bundled retail.

Vapi Pricing in 2026: Why $0.05 Per Minute Is Never What You Pay

Vapi publishes its $0.05 per minute platform rate publicly. That number is the orchestration layer alone. Once Speech-to-Text, LLM tokens, Text-to-Speech, and telephony are stacked, a typical sales call lands at $0.25 to $0.33 per minute all-in. Here is the honest per-component math.

Last verified June 2026. Provider pricing changes frequently; always confirm before purchase.

$0.05/min

Vapi platform layer

$0.25-$0.33/min

Real all-in (typical)

$1.40-$1.65

5-min call (mid-stack)

Negotiable down to $0.03

100K-min/mo volume

§The Five-Layer Voice AI Cost Stack

Every Vapi minute consumes five distinct providers, each priced independently. Add them to get your real per-minute cost. The reason Vapi publishes the platform layer in isolation is that this is the only layer Vapi owns; the rest is pass-through to providers whose pricing Vapi cannot guarantee.

LAYER 1

Vapi platform (orchestration)

$0.05/minute

Vapi's owned layer: audio routing, conversation state machine, function-calling infrastructure, real-time interrupt handling, voice activity detection (VAD) tuning, recording and transcription delivery to your application. This is what makes Vapi worth using over building your own pipeline. Charged at $0.05 per minute on standard tier, dropping at high volume.

LAYER 2

Speech-to-Text (STT)

$0.005-$0.012/minute

Deepgram Nova-2 at $0.0043 per minute is the modal choice for sales; AssemblyAI at $0.009 per minute and OpenAI Whisper-hosted at $0.006 per minute are alternatives. Voice quality (latency, accent handling, accuracy on industry terms) matters more than the absolute price for sales use cases.

LAYER 3

LLM (the brain)

$0.04-$0.18/minute

The biggest line item by far. GPT-4o-mini at typical sales conversation token volume costs $0.04 to $0.08 per minute. GPT-4o full costs $0.10 to $0.18 per minute. Claude Sonnet 3.5/3.6 lands in a similar range to GPT-4o. Gemini Flash 1.5 is the budget option at $0.03 to $0.06 per minute, with quality trade-offs on objection handling. The token volume per minute of voice conversation averages 700 to 1,400 input plus 300 to 600 output tokens.

LAYER 4

Text-to-Speech (TTS)

$0.04-$0.18/minute

The second-biggest line item, and the one with the most quality-versus-cost variability. ElevenLabs Multilingual v2 at $0.12 to $0.18 per minute is the realism leader; ElevenLabs Turbo v2.5 at $0.06 to $0.09 is the latency-optimised mid-tier; Cartesia Sonic at $0.04 to $0.07 per minute is the latency leader; PlayHT at $0.05 to $0.08 per minute is a workhorse mid-tier; OpenAI TTS at $0.03 to $0.06 per minute is the budget choice at lower realism.

LAYER 5

Telephony (calling infrastructure)

$0.01-$0.03/minute

Twilio Programmable Voice at $0.014 per minute (inbound and outbound, US, domestic numbers) is the modal choice. Telnyx at $0.0035 to $0.005 per minute is the cost leader. Plivo at $0.0080 per minute is mid-tier. SIP trunking pricing applies for high-volume operators on Telnyx or Twilio Elastic SIP. International numbers carry significantly higher per-minute pass-through.

§Three Reference Stacks: Budget, Mid, Premium

The same Vapi-built sales agent can run at very different costs depending on component choice. These three reference stacks bracket the realistic 2026 sales-call pricing band:

Component	Budget	Mid	Premium
Platform (Vapi)	$0.05	$0.05	$0.05
STT	Deepgram $0.0043	Deepgram $0.0043	Deepgram $0.0043
LLM	Gemini Flash $0.04	GPT-4o-mini $0.06	GPT-4o full $0.15
TTS	OpenAI $0.04	ElevenLabs Turbo $0.08	ElevenLabs Multi v2 $0.15
Telephony	Telnyx $0.005	Twilio $0.014	Twilio $0.014
Per-minute total	$0.139	$0.208	$0.363
Per 5-min call	$0.70	$1.04	$1.82

The honest reading: The headline $0.25-$0.33 per minute Vapi all-in figure comes from the mid-stack at moderate-realism TTS. Pushing TTS down to Cartesia or OpenAI compresses the stack below $0.20 per minute at moderate sales-call quality loss. Pushing TTS up to ElevenLabs Multilingual v2 (which is what most marketing demos use) lands above $0.35 per minute. Latency-budget and conversational-realism budget often pull in opposite directions.

§Monthly Volume Math for a Sales Team

For a sales team running Vapi as an outbound AI SDR or inbound qualifier, the monthly cost depends on minutes-per-call times calls-per-day times working-days. A typical mid-volume deployment looks like:

Scenario	Calls/day	Min/call	Monthly min	Monthly cost (mid)
Low: 1 outbound rep test	50	3	3,000	$625
Mid: 5-rep team outbound	250	3	15,000	$3,125
High: inbound qualifier 24x7	400	4	32,000	$6,650
Scale: 50K calls/mo, 5-min avg	2,500	5	250,000	$45K-$55K

For comparison, an 11x Alice subscription delivering similar outbound qualification volume runs $5,000 per month flat. The Vapi mid-volume 5-rep deployment runs $3,125 per month. The break-even point favours Vapi at lower volume per agent and favours 11x at the very high volume per Alice instance (where Alice's flat pricing structurally undercuts variable-cost Vapi).

§Hidden Costs Most Vapi Buyers Miss

Engineering time (the dominant hidden cost)

A production-grade Vapi sales agent typically takes 4 to 12 engineering weeks to build, tune, and harden against edge cases. At a $200,000 loaded engineer salary, that is $15,000 to $46,000 of build cost. This is the line most build-vs-buy analyses underestimate by 3x.

Voice cloning costs

If you use ElevenLabs Voice Lab for a custom voice clone (a common branding choice for higher-end deployments), expect $99 to $1,320 per month subscription on top of per-minute TTS. Voice Clone API access is gated behind the Enterprise tier for some commercial use cases.

Phone number registration (TCPA, STIR/SHAKEN)

US outbound numbers require carrier registration to avoid being flagged as spam (Twilio's A2P 10DLC registration costs $4 setup plus $1.50/month per phone number, plus throughput tier fees). Without registration, AI voice calls are heavily filtered by Verizon, AT&T, and T-Mobile spam filtering algorithms.

Call recording storage

Vapi delivers recordings to your application; storing recordings beyond 30 days at scale (compliance retention, training data) typically lands at $0.02 to $0.04 per call on S3 Standard or equivalent. At 50,000 calls per month this is $1,000 to $2,000 per month.

§FAQ

Can I get Vapi for less than $0.20 per minute all-in?

Yes, with budget choices: Gemini Flash 1.5 plus Cartesia Sonic TTS plus Telnyx telephony plus Deepgram STT lands at roughly $0.14 per minute all-in. The trade-off is sales-call quality: Gemini Flash handles less objection nuance, Cartesia Sonic sounds slightly more synthetic than ElevenLabs. For inbound qualification where speed and cost outweigh charm, the budget stack is fine; for outbound cold calls where charm matters, the mid stack is more defensible.

Is there a Vapi free tier?

Vapi offers a small free credit (typically $10 to $25 on signup) for development and testing. Production traffic moves to pay-as-you-go immediately. There is no enduring free monthly allowance for production usage.

Does Vapi charge for inbound versus outbound differently?

Vapi's platform fee is identical at $0.05 per minute for inbound and outbound. The telephony pass-through differs: outbound calls incur Twilio or Telnyx per-minute outbound charges; inbound calls incur per-number monthly fees plus inbound per-minute. Inbound is typically slightly cheaper per minute but adds the monthly per-number fixed cost.

What is the latency I should budget for?

Realistic 2026 latency for the Vapi standard stack (Deepgram STT plus GPT-4o-mini plus ElevenLabs Turbo TTS) is 800ms to 1.2 seconds first-token to first-audio-out for the model's reply, dropping to 500-800ms with Cartesia Sonic on TTS. Sub-500ms is achievable with local-hosted models on dedicated hardware but pushes the build complexity meaningfully.

Can I use Vapi for HIPAA-compliant healthcare outbound?

Vapi offers a HIPAA-compliant tier with BAA execution and PHI-handling controls; you also need to ensure the LLM, STT, and TTS providers in your stack are HIPAA-eligible. OpenAI, Anthropic, Google, Deepgram, and ElevenLabs all offer HIPAA-eligible enterprise tiers; default pay-as-you-go API endpoints typically are not HIPAA-compliant. See the full HIPAA implications on the dedicated page.

Does Vapi support warm transfer to a human?

Yes. Vapi supports SIP transfer, REFER, and bridging patterns for handoff to a human AE when the AI cannot handle the conversation. Implementation involves wiring Vapi's function-calling to your CRM or call-routing system. Most production deployments include warm-transfer logic for buying-signal moments.

Voice AI compared →Vapi vs Retell →Retell pricing →Build vs buy AI SDR →