Audio Transcription
Transcribe audio files to text with optional diarization and known-speaker hints.
Content
Transcribe audio using OpenAI, with optional speaker diarization when requested. Prefer the bundled CLI for deterministic, repeatable runs.
Workflow
1. Collect inputs: audio file path(s), desired response format (text/json/diarized_json), optional language hint, and any known speaker references.
2. Verify OPENAI_API_KEY is set. If missing, ask the user to set it locally (do not ask them to paste the key).
3. Run the bundled transcribe_diarize.py CLI with sensible defaults (fast text transcription).
4. Validate the output: transcription quality, speaker labels, and segment boundaries; iterate with a single targeted change if needed.
5. Save outputs under output/transcribe/ when working in this repo.
Decision rules
- -Default to
gpt-4o-mini-transcribewith--response-format textfor fast transcription. - -If the user wants speaker labels or diarization, use
--model gpt-4o-transcribe-diarize --response-format diarized_json. - -If audio is longer than ~30 seconds, keep
--chunking-strategy auto. - -Prompting is not supported for
gpt-4o-transcribe-diarize.
Output conventions
- -Use
output/transcribe/<job-id>/for evaluation runs. - -Use
--out-dirfor multiple files to avoid overwriting.
Dependencies (install if missing)
Prefer uv for dependency management.
If uv is unavailable:
Environment
- -
OPENAI_API_KEYmust be set for live API calls. - -If the key is missing, instruct the user to create one in the OpenAI platform UI and export it in their shell.
- -Never ask the user to paste the full key in chat.
Skill path (set once)
User-scoped skills install under $CODEX_HOME/skills (default: ~/.codex/skills).
CLI quick start
Single file (fast text default):
Diarization with known speakers (up to 4):
Plain text output (explicit):
Reference map
- -
references/api.md: supported formats, limits, response formats, and known-speaker notes.
FAQ
Discussion
Loading comments...