Getting Started with icana-whisper
Transcribe your first audio file in under 5 minutes. Python and Node.js quickstart with speaker diarization, vocabulary hints, and Australian data residency built in.
What is icana-whisper?
icana-whisper is the transcription API behind Australian Transcription. It's built on OpenAI's Whisper model, hosted entirely on AWS infrastructure in Sydney, and wrapped in a clean REST API designed to be straightforward to integrate.
Because all processing happens in Australia, your audio never leaves the country — so APP 8 cross-border disclosure obligations under the Privacy Act are never triggered. That matters for any Australian business handling recordings that contain personal information.
- Speaker diarization — labels each speaker in the transcript automatically
- Multi-format support — MP3, WAV, OGG, FLAC, M4A
- Vocabulary hints — prompt the model with domain-specific terms, names, or abbreviations
- 90 minutes free — no credit card required to get started
Prerequisites
- Python 3.8+ or Node.js 18+
- An API key — sign up free, no credit card required
- An audio file (MP3, WAV, OGG, FLAC, or M4A)
How the API works
The API is asynchronous. You submit a file to POST /api/v1/transcribe and get back a job_id. You then poll GET /api/v1/jobs/{job_id} until the status is completed.
Python quickstart
Install the only dependency you need:
pip install requests
Then submit a file and poll for the result:
import time
import requests
API_KEY = "sk_your_api_key_here"
BASE_URL = "https://api.icana.ai/api/v1"
HEADERS = {"X-API-Key": API_KEY}
# 1. Submit the audio file
with open("audio.mp3", "rb") as f:
resp = requests.post(
f"{BASE_URL}/transcribe",
headers=HEADERS,
files={"file": f},
data={"language": "en"},
)
data = resp.json()
job_id = data["job_id"]
print(f"Job submitted: {job_id}")
print(f"Status: {data['status']}") # "processing" or "pending"
# 2. Poll until complete
while True:
result = requests.get(f"{BASE_URL}/jobs/{job_id}", headers=HEADERS).json()
status = result["status"]
print(f"Status: {status}")
if status == "completed":
print("\nTranscription:")
print(result["transcription"])
print("\nSpeaker diarization:")
print(result["diarization"])
break
elif status == "failed":
print("Job failed.")
break
time.sleep(3)
Node.js quickstart
Install the dependencies:
npm install form-data node-fetch
const fs = require("fs");
const FormData = require("form-data");
const fetch = require("node-fetch");
const API_KEY = "sk_your_api_key_here";
const BASE_URL = "https://api.icana.ai/api/v1";
async function transcribe(filePath) {
const form = new FormData();
form.append("file", fs.createReadStream(filePath));
form.append("language", "en");
const res = await fetch(`${BASE_URL}/transcribe`, {
method: "POST",
headers: { "X-API-Key": API_KEY, ...form.getHeaders() },
body: form,
});
const { job_id, status } = await res.json();
console.log(`Job submitted: ${job_id} (${status})`);
// Poll for result
while (true) {
await new Promise((r) => setTimeout(r, 3000));
const result = await fetch(`${BASE_URL}/jobs/${job_id}`, {
headers: { "X-API-Key": API_KEY },
}).then((r) => r.json());
if (result.status === "completed") {
console.log("Transcription:", result.transcription);
console.log("Diarization:", result.diarization);
break;
} else if (result.status === "failed") {
console.log("Job failed.");
break;
}
console.log("Status:", result.status);
}
}
transcribe("audio.mp3");
Speaker diarization
The API labels each turn of the conversation automatically. By default it assumes 2 speakers. Pass num_speakers for better accuracy when you know the exact count (1-10):
resp = requests.post(
f"{BASE_URL}/transcribe",
headers=HEADERS,
files={"file": open("meeting.mp3", "rb")},
data={"language": "en", "num_speakers": 3},
)
The diarization field in a completed job looks like:
[Speaker 1]: Hello, thanks for joining the call.
[Speaker 2]: Happy to be here.
[Speaker 1]: Let's get started.
[Speaker 3]: Quick question before we do...
Improving accuracy with prompts
The prompt parameter lets you hint at domain-specific terms, names, or abbreviations in your audio. Whisper uses this to improve spelling of uncommon words:
data={
"language": "en",
"prompt": "ACME Corp, Dr. Nguyen, KPIs, Q3 review, CRM",
}
Keep prompts concise — only the last ~200 tokens (~40-60 words) are used. Focus on terms most likely to be misspelled.
Checking your credit balance
The /api/v1/credit endpoint returns your current AUD balance:
resp = requests.get(f"{BASE_URL}/credit", headers=HEADERS)
print(resp.json())
{
"balance_aud": "4.80",
"price_per_minute_aud": "0.02",
"estimated_remaining_minutes": 240,
"total_topped_up_aud": "5.00",
"total_used_aud": "0.20"
}
Rate limits
Endpoints are rate-limited per API key. If you exceed a limit the API returns 429 Too Many Requests with a Retry-After header.
| Endpoint | Rate limit |
|---|---|
| POST /api/v1/transcribe | 10 requests per minute |
| All other endpoints | 60 requests per minute |
API endpoints at a glance
| Endpoint | Method | Description |
|---|---|---|
| /api/v1/transcribe | POST | Submit audio. Returns job_id. |
| /api/v1/jobs/{'{job_id}'} | GET | Poll job status. Returns transcription and diarization when complete. |
| /api/v1/jobs | GET | List all jobs for your account (paginated). |
| /api/v1/credit | GET | Get your AUD credit balance and usage summary. |
Full API documentation is available at /docs.
Get your free API key
90 minutes free. No credit card required. Australian data residency included.