<aside> <img src="/icons/asterisk_yellow.svg" alt="/icons/asterisk_yellow.svg" width="40px" />
The fastest test of transcription capabilities, with 1 transcript to start, and endlessly scalable.
My typical test with this protocol -
1st round: 1 audio file → transcript
2nd round: 3 languages x 3 audio files for each x ****3 tools tested in one go ≈2 hours for full process
</aside>
Complete? | Phase | Max min | What you do | Output + documentation |
---|
| ✅ | A. Setup | 5 min | • Pick 1 interview recording (can be audio-only)
• Create a new Test folder where you’ll document the test files and results **** | Folder created
Saved original .mp3/4/.wav… | | --- | --- | --- | --- | --- |
| ✅ | B. Tool 1 - CONTROL - “Gold Standard” Transcript [Or simply choose Test Tool #1] | 2 min | • Run audio file through a trusted tool or use a transcript you transcribed manually for the exact interview recording •
• Save as “[File name details - Control]” in your Test folder | Transcript #1: One “gold standard” / control transcript saved in test folder | | --- | --- | --- | --- | --- | | | C. Test Tool 2 | 2 min | • Drag the same audio file into Tool #2. • Wait for it to finish processing. • Download or copy-paste complete transcript into document | Transcript #2 | | | D. Test Tool 3 | 2 min | • Drag the same audio file into Tool #2. • Wait for it to finish processing. • Download or copy-paste complete transcript into document | Transcript #3 |
| ✅ | E. Save all transcripts | 5 min | Save all 3 transcripts in test folder ”[File name details - Tool X name]” | | | --- | --- | --- | --- | --- |
| ✅ | F. Manual Check | 15 min | Human spot-check: • Skim for typical issues: names, numbers, domain jargon, language problems…
• What are you particularly interested in? Finding a tool that accurately captures brand names, handles language/dialects in specific contexts, slang? | Note your thoughts in a document - Save in Test folder | | --- | --- | --- | --- | --- |
| ✅ | G. WER Eval in Claude | 15 min | Automated (fast):
• Run a quick Word-Error-Rate (WER) script comparing Gold transcript vs. other tools. | WER error % and standout error examples.
Notes about errors Claude caught, etc.
Verdict: which tool performed best. | | --- | --- | --- | --- | --- |
| ✅ | H. Final Verdict | 10 min | Compare your thoughts with the WER results.
Identify the top errors, and how much they matter to your work.
Consider: How much time might it take to deal with these errors in manual cleaning, how big a deal is this if all our transcripts are like this? | One-pager noting issues and wins per tool, and final verdict about each - **go/no-go.
Save this in test folder, and anywhere else your team needs to see it.** | | --- | --- | --- | --- | --- |