Deepgram Alternative with Australian Data Residency
Deepgram is fast and well-engineered. But if you're building for Australian clients, there's a problem you need to know about.
The data residency problem with Deepgram in Australia
Deepgram is one of the better transcription APIs on the market. It's fast, the accuracy is solid, and the streaming API is genuinely useful for real-time use cases. If your product operates entirely within the United States, it's a reasonable choice.
But Deepgram's infrastructure is US-based. Every audio file you submit crosses the border. For teams building products that process audio from Australian individuals — call recordings, interview transcripts, patient consultations, financial advice sessions — that creates a legal exposure under the Australian Privacy Act 1988 (Cth).
The specific obligation is APP 8, the cross-border disclosure principle. Sending personal information to an overseas service (including audio recordings that contain voice) requires you to take reasonable steps to ensure the overseas recipient won't breach the Australian Privacy Principles. In practice, this means formal due diligence, potentially notifying affected individuals, and accepting accountability for any breach by the overseas provider. Most engineering teams don't do this — because they don't know they need to.
For teams building in regulated industries — fintech (APRA-regulated entities), healthtech (My Health Records Act), legaltech — the exposure is higher still. APRA's CPS 234 requires regulated entities to assess information security arrangements for third-party providers. A US-hosted transcription service that processes sensitive member or customer audio is exactly the kind of arrangement that APRA wants documented and justified.
Australian Transcription runs entirely on AWS infrastructure in Sydney (ap-southeast-2). Your audio is processed in Australia, stored in Australia, and never crosses the border. APP 8 obligations are never triggered, because there is no cross-border disclosure. For APRA-adjacent teams, the data residency answer is clean and documentable.
Side-by-side comparison
| Feature | Australian Transcription | Deepgram |
|---|---|---|
| Data residency | Australia (AWS Sydney) | United States |
| APP 8 compliance | Obligation never triggered | Triggered by cross-border disclosure |
| APRA suitability | Built to support APRA-regulated customers | Requires offshore data transfer assessment |
| Pricing model | $0.02 AUD/min flat Speaker diarization included |
USD $0.0043/min (Nova-3) Varies by model tier |
| Speaker diarization | Included | Available |
| Streaming / real-time | File upload (async) only | Streaming WebSocket supported |
| Free tier | 90 min free, no credit card | USD $200 credit (requires card) |
| Custom vocabulary |
Via prompt parameter
|
Keywords parameter supported |
Deepgram pricing based on publicly listed rates (USD). Australian Transcription pricing in AUD. Rates last verified July 2026. Verify current rates at each provider before making purchasing decisions.
Where Deepgram has an edge
Deepgram's streaming WebSocket API is genuinely good — if you need live transcription (voice assistants, real-time captioning, live call monitoring), Deepgram handles this well and Australian Transcription currently doesn't offer streaming. Deepgram also has a broader model selection and some enterprise features we don't yet match.
For teams whose use case is batch transcription (recorded calls, uploaded audio, asynchronous processing) and whose clients are in Australia, Australian Transcription is the cleaner choice: simpler pricing, no cross-border exposure, and no privacy compliance overhead.
Switching from Deepgram
Both APIs use similar async patterns for pre-recorded audio. The main differences are authentication style and how results are structured. Here's a side-by-side of the common file transcription pattern:
from deepgram import DeepgramClient, PrerecordedOptions
dg = DeepgramClient("YOUR_API_KEY")
with open("audio.mp3", "rb") as f:
buffer_data = f.read()
payload = {"buffer": buffer_data}
options = PrerecordedOptions(
model="nova-3",
diarize=True,
smart_format=True,
)
response = dg.listen.rest.v("1").transcribe_file(
payload, options
)
# Result
result = response.results
transcript = result.channels[0].alternatives[0].transcript
print(transcript)
# Speaker diarization
for word in result.channels[0].alternatives[0].words:
if word.speaker is not None:
print(f"Speaker {word.speaker}: {word.word}")
import requests, time
HEADERS = {"X-API-Key": "YOUR_API_KEY"}
BASE = "https://api.icana.ai/api/v1"
# Submit
with open("audio.mp3", "rb") as f:
r = requests.post(
f"{BASE}/transcribe",
headers=HEADERS,
files={"file": f},
data={"num_speakers": 2}
)
job_id = r.json()["job_id"]
# Poll
for _ in range(60):
r = requests.get(
f"{BASE}/jobs/{job_id}",
headers=HEADERS
)
d = r.json()
if d["status"] == "complete":
print(d["transcription"])
# Speaker diarization
for seg in d.get("diarization", []):
print(f"{seg['speaker']}: {seg['text']}")
break
elif d["status"] == "failed":
raise RuntimeError(d.get("error"))
time.sleep(5)
Key differences when migrating from Deepgram:
- Authentication is an
X-API-Keyheader rather thanAuthorization: Token - The API is async by design — submit a file, get a
job_id, poll until complete - Diarization output is a
diarizationarray of{speaker, text, start, end}segments rather than per-word speaker labels - Custom vocabulary uses the
promptfield (comma-separated terms passed at submission time)
For pre-recorded audio workflows, the migration is straightforward. Most teams complete it in a few hours. The polling pattern and error handling can be lifted almost directly from existing Deepgram code.
Try it before you commit
Sign up and get 90 minutes of free transcription. No credit card required. Test on your own audio before making a decision.