
Best accuracy rate for English

Assembly AI
OpenAI
Azure Batch
Deepgram
Google STT
Benchmark
Methodology
We selected several datasets that contain original audio files along with validated ground truth transcripts. The audio files were uploaded to Salad S4 Storage and processed through both Salad Transcription API and Transcription Lite to obtain their transcriptions.
To ensure consistency in evaluation, both the predicted transcripts and the ground truth were normalized using the open-source Whisper Normalizer: https://pypi.org/project/whisper-normalizer/. This step helps remove discrepancies caused by variations in punctuation, capitalization, and formatting.
For accuracy assessment, we calculated Word Error Rate (WER) between the ground truth and the transcribed outputs. The WER was calculated using the JiWER library: https://pypi.org/project/jiwer/. The average WER was then determined for each dataset. The entire benchmark process was replicated from the Assembly AI benchmark of various API providers to ensure consistency in comparison.





Accuracy benchmark of Salad Transcription API
A detailed blog about the methodology, workflow and results

3 public datasets across
multiple speech types
The benchmark was performed on 3 different public datasets, covering speech types from phone calls to presentations.

Common Voice
Large-scale, open-source collection of Multilingual data, created by Mozilla to improve speech recognition technologies.
Over 1M files & 1.5k+ audio hours.

TED-LIUM
A collection of transcribed audio recordings from TED Talks with transcriptions sampled at 16 kHz.
118 to 452 hours of speech data.

Meanwhile
A collection of audio clips from The Late Show with Stephen Colbert, published as part of the Whisper release by OpenAI.
64 audio segments.
Best across 8 languages
Salad Transcription API scored the highest accuracy rate on average across 8 major languages in the benchmark.
# | lANGUAGE | COMMON VOICE | TED-LIUM | MEANWHILE |
---|---|---|---|---|
#1 | English ENG | 95.1% | 95.8% | 95.7% |
#2 | Spanish ES | 96.8% | - | - |
#3 | Portugese PT | 92.0% | - | - |
#4 | Italian IN | 93.3% | - | - |
#5 | French FR | 92.0% | - | - |
#6 | German DE | 96.3% | - | - |
#7 | Russian RU | 96.3% | - | - |
#6 | Hindi HI | 84.0% | - | - |
Lowest WER for English
Salad Transcription API scored the lowest Word Error Rate (WER) for English in the industry, comfortably beating out other APIs for 40% less cost.
Dataset | Salad Transcription API | Salad Transcription Lite | Assembly AI Universal | Amazon Transcribe | Google Latest Long | Azure Batch v3.1 | Deepgram Nova 2 | OpenAI Whisper |
---|---|---|---|---|---|---|---|---|
Common Voice | 4.90% | 18.70% | 6.67% | 8.98% | 17.59% | 7.81% | 12.43% | 8.83% |
Meanwhile | 4.30% | 16.70% | 4.77% | 7.27% | 11.67% | 6.73% | 5.56% | 9.75% |
TED-LIUM | 4.20% | 8.20% | 7.21% | 9.12% | 11.69% | 9.27% | 8.98% | 7.30% |
Get in touch with Sales for discounted pricing
Save even more for high-volume transcription.