Blog

It is not your prompt. It is not your LLM. It is your endpointing configuration, and Vapi just shipped the fix. Here is what changed, how it works under the hood, and how Codoro deploys it so your AI intake desk finally sounds human.
Blog Contents
Your AI voice agent cost you a lead today. Not because it gave a wrong answer. Because it cut the caller off mid-sentence while they were saying their phone number.
Or it went completely silent for three seconds after they asked a question, and they hung up assuming the line had dropped.
This is not a prompt engineering problem. It is an architecture problem. And as of May 2026, Vapi shipped the fix.
BENCHMARK · VAPI DEVELOPER DOCS, MAY 2026
~260ms
Deepgram Flux end-of-turn detection latency, released May 18, 2026. Down from the 100 to 500ms heuristic range that caused most interruption loop complaints developers were reporting across forums.
Before Smart Endpointing, Vapi and most voice AI platforms used a simple heuristic: detect silence for X milliseconds, then assume the caller is done speaking and fire the response.
It sounds reasonable. It breaks constantly in practice.
THE TWO FAILURE MODES KILLING CONVERSION RATES
The Interruption Loop: A caller says "Uhm, let me check..." and pauses for half a second. The silence threshold trips. The agent fires its response mid-thought, cutting the customer off. The caller says "sorry, what?" The agent interrupts again. The call sounds broken and amateur.
Dead-Air Lockout: To fix interruptions, developers cranked up the wait parameters. Now the agent waits 3 seconds after every sentence before responding. Callers think the line dropped. They hang up. The business loses the lead.
Both came from the same root cause: the system had no understanding of conversational context. It only knew silence versus sound.
Vapi's Smart Endpointing, rolled out through Spring 2026, replaces the silence-timeout model entirely. Instead of listening for dead air, the system now runs a multimodal dialogue processor that combines two inputs simultaneously.
When a caller trails off mid-number, reading out a phone number, an invoice ID, or an address, the system detects that pattern and delays its response window automatically. It understands the caller is not done yet because the linguistic structure is incomplete, even if there is a pause in the audio.
The legacy parameters like wordFinalizationMaxWaitTime and standalone timeout endpoints have been deprecated. All speech timing is now managed through the transcriptionEndpointingPlan inside the assistant builder.
Two parameters control the entire system. Understanding them is the difference between a voice agent that sounds human and one that sounds like a broken phone tree.
Parameter 01
eotThreshold
Default: 0.7
The confidence level required before the system declares a turn end. Lower values produce faster, more aggressive responses. Higher values make the agent more patient. Configure this based on your industry and caller profile.
Parameter 02
eotTimeoutMs
Default: 5000ms
The absolute maximum wait window before a hard-stop turn termination occurs regardless of confidence score. Acts as a failsafe so the agent never hangs indefinitely on a dropped or silent line.
How to configure eotThreshold by industry:
LATENCY BENCHMARK · VAPI DOCS
700ms to 1,500ms
Real-world end-to-end response latency for unoptimized Vapi deployments with heavy system prompts. A properly configured stack using Deepgram Flux hits 500 to 700ms total. That difference is audible. Callers notice it.
Smart Endpointing is only as good as the transcription model feeding it. Vapi now offers two specialized options that go well beyond standard ASR.
Deepgram Flux Multilingual
Released May 18, 2026, Flux operates as Conversational Speech Recognition rather than standard Automatic Speech Recognition. Instead of transcribing words and passing them upstream, Flux embeds turn-taking mechanics directly inside the live streaming connection.
It handles native barge-in events and multi-language code-switching natively at approximately 260ms end-of-turn detection latency. Lindy, an AI Employee platform, publicly confirmed after migrating to Flux on Vapi that they eliminated all fragile client-side tracking logic and achieved the smoothest, completely interruption-free conversational experiences on the market.
Soniox
Soniox operates as a specialized alternative for use cases requiring high precision on complex alphanumeric data. If your agent needs to capture email addresses, hardware serial numbers, medical record IDs, or any structured identifier where a single misheard character creates a downstream failure, Soniox is the right choice.
It runs a single unified API combining STT and TTS to minimize round-trip pipeline penalties, maintaining sub-500ms regional streaming processing across 60 or more distinct languages.
The choice of voice infrastructure is not just a latency decision. It is a cost and architecture decision that compounds over every minute your agent runs.
| Platform | Engineering Approach | Production Cost | Latency | Best For |
|---|---|---|---|---|
| Vapi | Developer-centric API. Extreme modularity, hot-swap any model layer. | $0.20 to $0.33 per min | 500ms to 700ms optimized | Complex builds, compliance, agencies needing full control |
| Retell AI | Full telephony infrastructure. Natural multi-turn models out of the box. | $0.13 to $0.31 per min | 600ms to 800ms | Customer support, high-intent sales qualification |
| Bland AI | Scale-centric outbound platform for immense concurrent call loops. | ~$0.09 per min | 700ms to 900ms | Mass outbound, appointment reminders, list-dialing |
Vapi costs more than the alternatives. That premium buys you modularity. You can swap Deepgram Flux for Soniox, swap GPT-4o for Claude, swap ElevenLabs for a different TTS provider, all within the same pipeline without rebuilding from scratch. For businesses where the conversation quality is the product, that flexibility is worth the cost delta.
CODORO · VOICE AI SYSTEMS · US AND UK
A great voice agent is not just a good prompt. It is a correctly configured infrastructure stack.
Most businesses deploying Vapi today are running default configurations. Default eotThreshold. Default transcribers. No calibration for their specific industry, caller profile, or use case. The interruption loop and dead-air problems persist because nobody tuned the system.
Codoro builds Vapi deployments from scratch, configured specifically for your business. We handle every layer of the stack:
If your voice agent is losing leads because it sounds robotic, interrupts callers, or drops into silence mid-conversation, that is a solvable configuration problem. Get in touch and we will fix it.
Vapi's Smart Endpointing update resolves the two most complained-about failure modes in AI voice agent deployments: the interruption loop and the dead-air lockout. Both were symptoms of the same architectural flaw, a system that understood silence but not conversation.
The fix is now available. Deepgram Flux at 260ms. Neural network turn detection. Configurable confidence thresholds tuned to your industry. The technology is proven and deployed in production.
The question, as always, is whether your deployment is configured to take advantage of it.
What is Vapi Smart Endpointing?
What is the eotThreshold parameter in Vapi?
How does Deepgram Flux differ from standard ASR?
How much does Vapi cost in 2026?
How does Codoro deploy Vapi for businesses?
Summary
Your AI voice agent is not broken because of your prompt. It is broken because of your endpointing configuration. Vapi shipped the fix in May 2026. Here is what changed and how CodoroAI deploys it correctly.