Developer Guide

Hosted Whisper API for Australia

OpenAI Whisper is excellent. Running it yourself is a headache. Sending audio to OpenAI's API sends it overseas. Here's the alternative.

2 July 2026 · Developer Guide

The three options for Whisper in Australia — and the problem with two of them

When you need speech-to-text in an Australian product, Whisper is usually the starting point. The model is open-source, the accuracy is good, and it handles Australian accents reasonably well. But there are three ways to use it, and two of them have serious problems for Australian teams.

Option 1: Run Whisper yourself

Self-hosting Whisper keeps data in Australia and gives you full control. It's also a significant operational burden. You need GPU instances (Whisper is slow on CPU for anything but the smallest model), infrastructure to queue and process jobs, monitoring, scaling, model updates, and someone to be on-call when it breaks. For teams whose core product is not transcription infrastructure, this is usually not the right trade.

Option 2: OpenAI's Whisper API

OpenAI offers a hosted Whisper endpoint at api.openai.com/v1/audio/transcriptions. It's convenient, fast, and cheap. It's also hosted in the United States.

Every audio file you send to OpenAI's API crosses the Australian border. If that audio contains personal information about Australian individuals — which most business recordings do — you have just triggered APP 8 of the Privacy Act 1988 (Cth). APP 8 requires you to take reasonable steps to ensure the overseas recipient won't breach Australian Privacy Principles. OpenAI's terms and data processing agreements are designed for a US-centric market. Most Australian businesses haven't done the due diligence APP 8 requires, which means every API call is a latent privacy breach.

Option 3: A managed Whisper API in Australia

Australian Transcription is built on the same Whisper model family, hosted entirely on AWS ap-southeast-2 (Sydney). Your audio is processed and stored in Australia. It never leaves the country, so APP 8 is never triggered. You get a simple REST API without the operational overhead of running the infrastructure yourself.

Quickstart: transcribe audio in Python

The API follows a submit-then-poll pattern. You post a file, get back a job ID, and poll until the transcription is complete. Here's the minimal working example:

Python — submit and poll australian-whisper-quickstart.py
import requests
import time

API_KEY = "your-api-key"
BASE_URL = "https://api.icana.ai/api/v1"
HEADERS = {"X-API-Key": API_KEY}

# Step 1: Submit audio file
with open("meeting.mp3", "rb") as f:
    response = requests.post(
        f"{BASE_URL}/transcribe",
        headers=HEADERS,
        files={"file": f},
        data={
            "num_speakers": 2,       # optional: enables speaker diarization
            "prompt": "APRA, CPS 234, fintech",  # optional: vocabulary hints
        }
    )
response.raise_for_status()
job_id = response.json()["job_id"]
print(f"Job submitted: {job_id}")

# Step 2: Poll for completion (typically 20-60% of audio duration)
for attempt in range(60):
    result = requests.get(
        f"{BASE_URL}/jobs/{job_id}",
        headers=HEADERS
    ).json()

    status = result["status"]
    if status == "complete":
        print("\nTranscription:")
        print(result["transcription"])

        if result.get("diarization"):
            print("\nSpeaker breakdown:")
            for segment in result["diarization"]:
                start = segment["start"]
                speaker = segment["speaker"]
                text = segment["text"]
                print(f"  [{start:.1f}s] {speaker}: {text}")
        break
    elif status == "failed":
        raise RuntimeError(f"Job failed: {result.get('error')}")
    else:
        print(f"  [{attempt+1}/60] Status: {status} — waiting...")
        time.sleep(5)
else:
    raise TimeoutError("Job did not complete within 5 minutes")

Comparison with OpenAI's Whisper API

If you're migrating from OpenAI's Whisper endpoint, here's the key difference in the call pattern:

OpenAI Whisper API (US-hosted)
from openai import OpenAI

client = OpenAI(api_key="sk-...")

with open("audio.mp3", "rb") as f:
    transcript = client.audio.transcriptions.create(
        model="whisper-1",
        file=f,
        response_format="text"
    )

print(transcript)

# Note: audio crosses the Australian border.
# APP 8 obligations are triggered for
# recordings containing personal information.
Australian Transcription (AWS Sydney)
import requests, time

HEADERS = {"X-API-Key": "your-key"}
BASE = "https://api.icana.ai/api/v1"

with open("audio.mp3", "rb") as f:
    r = requests.post(
        f"{BASE}/transcribe",
        headers=HEADERS,
        files={"file": f}
    )
job_id = r.json()["job_id"]

for _ in range(60):
    d = requests.get(
        f"{BASE}/jobs/{job_id}",
        headers=HEADERS
    ).json()
    if d["status"] == "complete":
        print(d["transcription"])
        break
    time.sleep(5)

# Audio stays in AWS Sydney.
# APP 8 cross-border obligations are
# never triggered.

The main differences:

  • The API is async — you get a job_id and poll, rather than waiting for an immediate response. This makes it more suitable for large files and batch processing.
  • Speaker diarization is included — pass num_speakers and you get a structured diarization array in the result.
  • Custom vocabulary uses the prompt field — comma-separated terms to bias recognition toward your domain.

Cost comparison

Australian Transcription charges $0.02 AUD per minute of audio (standard transcription) or $0.03 AUD per minute with tone/sentiment analysis. There's no separate charge for speaker diarization.

OpenAI's Whisper API charges USD $0.006 per minute (approximately AUD $0.009 at current exchange rates). It's cheaper per minute, but doesn't include speaker diarization. For pure transcription at scale without diarization, OpenAI is currently cheaper — but for teams with Australian data residency requirements, the choice isn't really about price.

When self-hosting still makes sense

If you're processing very high volumes (tens of thousands of hours per month), or need a custom fine-tuned model, or have specific latency requirements that don't work with an async API, self-hosting Whisper on your own AWS ap-southeast-2 infrastructure may still be the right call. Australian Transcription is aimed at teams that don't want to run that infrastructure themselves.

Getting started

Sign up at australiantranscription.com.au/register — you get 90 minutes free with no credit card required. The full API reference is at australiantranscription.com.au/docs.

Try it free — 90 minutes, no card

Test on your own recordings before committing. Australian data residency from day one.