<aside> <img src="/icons/asterisk_yellow.svg" alt="/icons/asterisk_yellow.svg" width="40px" />

LLM used: Claude

Transcripts from: MacWhisper Pro (T1), Grain (T2), NotebookLM (T3)

Prompt used:

“Act as a master of transcript accuracy and comparison.

Run Python: jiwer.compare to compare the words transcribed in the [#] transcripts uploaded.”

Note: In my own evaluation, I found a few more errors in NotebookLM’s transcription that Claude didn’t identify as errors because all 3 made mistakes, but NotebookLM was farthest from the actual verbatim statements.

Everything below is Claude’s output -

</aside>

Transcript Accuracy Comparison Report

Executive Summary

I analyzed three different transcripts of the same interview recording using word-level accuracy metrics and content similarity analysis. All transcripts achieved high content similarity (85-90%), but each showed distinct characteristics in terms of accuracy, completeness, and formatting.

Transcript Sources

Quantitative Analysis

Word Count Statistics

Content Similarity (Jaccard Index)