Why outbound AI voice agents don’t scale (6 failure modes)
A successful outbound voice agent has a lot to do with what the agent says to the prospect, but also important is how it behaves technically – factors such as latency, concurrency, voicemail detection, and dialling contact lists effectively.
I know because we had these exact problems, and that was the motivation behind building Dialshark.
So today I want to walk you through six big failure modes we’ve overcome, and how real systems should be engineered to avoid them.
1. Low latency in outbound AI voice agents is table stakes
If your agent is even 300-400ms late to respond, the prospect feels the drag, interrupts, or hangs up. That destroys trust before the conversation even starts.
Most people treat latency as a model problem. It’s not. It’s an audio–relay–LLM pipeline problem.
What breaks at scale:
- round-trip delays between telephony, relay layer, and model
- jitter from unstable audio servers
- streaming tokens arriving too slowly
- vendors adding hidden buffering
What reliable systems do instead:
- keep the relay layer extremely close to the model
- run real-time audio on dedicated, low-variance infra
- treat <200ms latency as a hard requirement

2. Concurrency – The AI voice agent’s Achilles heel
Inbound is one agent per call.
Outbound is exposure therapy for your infrastructure as you will need 50, 200, 500 concurrent dials hitting the system at once.
This is what breaks at scale:
- model rate limiting
- audio server memory spikes
- call sessions dropping when concurrency bursts
- background jobs queuing faster than they clear
What reliable systems do instead:
- run concurrency-aware scheduling
- maintain isolated sessions with predictable teardown
- keep warm instances of the voice stack
- push calls through a relay layer that doesn’t fall over during bursts
3. Poor Voicemail detection (AMD) wastes so much budget, you’ll want to cry
Most failed outbound voice agents waste thousands of minutes talking to empty air, or worse chatting to an answering machine’s preamble message.
Everyone thinks voicemail detection is binary. It’s not – because when you are running a large scale marketing campaign, you will almost certainly run into stuff like:
- custom carrier voicemail tones and preambles
- IVR systems: “press 5 to leave a message”
- different lengths of answerphone playback message
A good outbound agent must decide in under a second whether it’s a human or a machine it is talking to so it can correctly disposition the call as early as possible.
The longer it takes your agent to decide this, the more of your budget gets unnecessarily drained.
4. AI voice agents can’t dial huge lists unless you’re making parallel calls
So you want your AI voice agent to dial a list of 50,000 hot new contacts, the problem is if your agent is dialling them one at a time it will take about a year to complete!
Furthermore, when scaling agents for outbound sales, all the previously quiet operational problems now become much louder.
What will break on you most often isn’t what you’d think:
- rate limits from carriers
- burst dialling that gets you flagged
- uneven pacing causing 20 agents idle and 20 overloaded
- no de-dupe, no retries, no prioritisation
A proper outbound engine needs:
- pacing logic
- safe throttling
- retry logic for busy and unreachable numbers
- real-time progress and disposition tracking
This is where Dialshark’s custom relay architecture is best in class – every client gets their own dedicated server so calls keep moving smoothly even under heavy load.
5. Call dispositions (the feedback loop most teams forget)
A voice agent that can’t update call outcomes effectively just creates a massive headache – one that someone on your team will have to deal with sooner or later.
Updating a Google sheet might work at 250 calls per week, but it doesn’t work when you’re dialling 250 calls an hour.
As a minimum you need reliable dispositions for things like:
- no answer
- voicemail
- interested
- follow-up
- booked
- DNC
Without clean dispositions, it is hard for operators to effectively optimise campaigns, refine messaging, or automate workflows.
Ultimately, the integrity of your call data depends on accurate dispostioning from the get go.
6. Third-party integrations suck and data is messy
This is the most underrated failure mode.
AI voice agents often lack the ability to actually take action, which is ironic given that action is the very definition of agency.
This can mean:
- the CRM API is mediocre
- the webhook is down
- the authentication token expired
When something with your data pipeline is broken it just looks like contacts details don’t appear, activities go missing, follow-ups don’t fire when they should.
Underneath it’s almost always an integration problem. When systems don’t play well with each other, revenue leakage becomes invisible (and very expensive!)
If there is one theme that runs through all six of these failure modes, it’s that outbound AI only works when conversation quality and system reliability move together.
Outbound sales is unforgiving because it’s realtime, high volume, and tied directly to revenue.
You’re not dealing with a chatbot answering the odd inbound query. You’re running something much more close to call-centre, where both agent behaviour and infrastructure matter.
This is why so many outbound agent pilots go up the swanny – everything feels fine at 50 calls a day. At 5,000 calls a day, even the smallest technical weakness can negatively affect results in a big way.
If you’re running outbound campaigns with low volume, you can get away with shortcuts. At scale, you don’t.
If you want to run call-centre scale outbound without adding headcount, Join the Dialshark waitlist