Developer Guide

Getting Started with icana-whisper

Transcribe your first audio file in under 5 minutes. Python and Node.js quickstart with speaker diarization, vocabulary hints, and Australian data residency built in.

What is icana-whisper?

icana-whisper is the transcription API behind Australian Transcription. It's built on OpenAI's Whisper model, hosted entirely on AWS infrastructure in Sydney, and wrapped in a clean REST API designed to be straightforward to integrate.

Because all processing happens in Australia, your audio never leaves the country — so APP 8 cross-border disclosure obligations under the Privacy Act are never triggered. That matters for any Australian business handling recordings that contain personal information.

  • Speaker diarization — labels each speaker in the transcript automatically
  • Multi-format support — MP3, WAV, OGG, FLAC, M4A
  • Vocabulary hints — prompt the model with domain-specific terms, names, or abbreviations
  • 90 minutes free — no credit card required to get started

Prerequisites

  • Python 3.8+ or Node.js 18+
  • An API key — sign up free, no credit card required
  • An audio file (MP3, WAV, OGG, FLAC, or M4A)

How the API works

The API is asynchronous. You submit a file to POST /api/v1/transcribe and get back a job_id. You then poll GET /api/v1/jobs/{job_id} until the status is completed.

Python quickstart

Install the only dependency you need:

shell
pip install requests

Then submit a file and poll for the result:

transcribe.py
import time
import requests

API_KEY = "sk_your_api_key_here"
BASE_URL = "https://api.icana.ai/api/v1"
HEADERS = {"X-API-Key": API_KEY}

# 1. Submit the audio file
with open("audio.mp3", "rb") as f:
    resp = requests.post(
        f"{BASE_URL}/transcribe",
        headers=HEADERS,
        files={"file": f},
        data={"language": "en"},
    )

data = resp.json()
job_id = data["job_id"]
print(f"Job submitted: {job_id}")
print(f"Status: {data['status']}")  # "processing" or "pending"

# 2. Poll until complete
while True:
    result = requests.get(f"{BASE_URL}/jobs/{job_id}", headers=HEADERS).json()
    status = result["status"]
    print(f"Status: {status}")

    if status == "completed":
        print("\nTranscription:")
        print(result["transcription"])
        print("\nSpeaker diarization:")
        print(result["diarization"])
        break
    elif status == "failed":
        print("Job failed.")
        break

    time.sleep(3)

Node.js quickstart

Install the dependencies:

shell
npm install form-data node-fetch
transcribe.js
const fs = require("fs");
const FormData = require("form-data");
const fetch = require("node-fetch");

const API_KEY = "sk_your_api_key_here";
const BASE_URL = "https://api.icana.ai/api/v1";

async function transcribe(filePath) {
  const form = new FormData();
  form.append("file", fs.createReadStream(filePath));
  form.append("language", "en");

  const res = await fetch(`${BASE_URL}/transcribe`, {
    method: "POST",
    headers: { "X-API-Key": API_KEY, ...form.getHeaders() },
    body: form,
  });

  const { job_id, status } = await res.json();
  console.log(`Job submitted: ${job_id} (${status})`);

  // Poll for result
  while (true) {
    await new Promise((r) => setTimeout(r, 3000));

    const result = await fetch(`${BASE_URL}/jobs/${job_id}`, {
      headers: { "X-API-Key": API_KEY },
    }).then((r) => r.json());

    if (result.status === "completed") {
      console.log("Transcription:", result.transcription);
      console.log("Diarization:", result.diarization);
      break;
    } else if (result.status === "failed") {
      console.log("Job failed.");
      break;
    }

    console.log("Status:", result.status);
  }
}

transcribe("audio.mp3");

Speaker diarization

The API labels each turn of the conversation automatically. By default it assumes 2 speakers. Pass num_speakers for better accuracy when you know the exact count (1-10):

python
resp = requests.post(
    f"{BASE_URL}/transcribe",
    headers=HEADERS,
    files={"file": open("meeting.mp3", "rb")},
    data={"language": "en", "num_speakers": 3},
)

The diarization field in a completed job looks like:

output
[Speaker 1]: Hello, thanks for joining the call.
[Speaker 2]: Happy to be here.
[Speaker 1]: Let's get started.
[Speaker 3]: Quick question before we do...

Improving accuracy with prompts

The prompt parameter lets you hint at domain-specific terms, names, or abbreviations in your audio. Whisper uses this to improve spelling of uncommon words:

python
data={
    "language": "en",
    "prompt": "ACME Corp, Dr. Nguyen, KPIs, Q3 review, CRM",
}

Keep prompts concise — only the last ~200 tokens (~40-60 words) are used. Focus on terms most likely to be misspelled.

Checking your credit balance

The /api/v1/credit endpoint returns your current AUD balance:

python
resp = requests.get(f"{BASE_URL}/credit", headers=HEADERS)
print(resp.json())
response
{
  "balance_aud": "4.80",
  "price_per_minute_aud": "0.02",
  "estimated_remaining_minutes": 240,
  "total_topped_up_aud": "5.00",
  "total_used_aud": "0.20"
}

Rate limits

Endpoints are rate-limited per API key. If you exceed a limit the API returns 429 Too Many Requests with a Retry-After header.

Endpoint Rate limit
POST /api/v1/transcribe 10 requests per minute
All other endpoints 60 requests per minute

API endpoints at a glance

Endpoint Method Description
/api/v1/transcribe POST Submit audio. Returns job_id.
/api/v1/jobs/{'{job_id}'} GET Poll job status. Returns transcription and diarization when complete.
/api/v1/jobs GET List all jobs for your account (paginated).
/api/v1/credit GET Get your AUD credit balance and usage summary.

Full API documentation is available at /docs.

Get your free API key

90 minutes free. No credit card required. Australian data residency included.