<aside> <img src="/icons/asterisk_yellow.svg" alt="/icons/asterisk_yellow.svg" width="40px" />
LLM used: Claude
Transcripts from: MacWhisper Pro (T1), Grain (T2), NotebookLM (T3)
Prompt used:
“Act as a master of transcript accuracy and comparison.
Run Python: jiwer.compare to compare the words transcribed in the [#] transcripts uploaded.”
Note: In my own evaluation, I found a few more errors in NotebookLM’s transcription that Claude didn’t identify as errors because all 3 made mistakes, but NotebookLM was farthest from the actual verbatim statements.
Everything below is Claude’s output -
</aside>
I analyzed three different transcripts of the same interview recording using word-level accuracy metrics and content similarity analysis. All transcripts achieved high content similarity (85-90%), but each showed distinct characteristics in terms of accuracy, completeness, and formatting.