BENCHMARK

Transcription API Accuracy

See how Salad Transcription API scored the highest accuracy rate in this recreation of a competitor benchmark.

Try the API for free
Lines

Best accuracy rate for English

95.1%
Assembly AI
93.4%
OpenAI
92.1%
Azure Batch
91.2%
Deepgram
91.0%
Google STT
90.8%
Results from a benchmark on CommonVoice 5.1 dataset replicating the Assembly AI benchmark. Read the benchmark results here.
process

Benchmark

Methodology

We selected several datasets that contain original audio files along with validated ground truth transcripts. The audio files were uploaded to Salad S4 Storage and processed through both Salad Transcription API and Transcription Lite to obtain their transcriptions.

To ensure consistency in evaluation, both the predicted transcripts and the ground truth were normalized using the open-source Whisper Normalizer: https://pypi.org/project/whisper-normalizer/. This step helps remove discrepancies caused by variations in punctuation, capitalization, and formatting.

For accuracy assessment, we calculated Word Error Rate (WER) between the ground truth and the transcribed outputs. The WER was calculated using the JiWER library: https://pypi.org/project/jiwer/. The average WER was then determined for each dataset. The entire benchmark process was replicated from the Assembly AI benchmark of various API providers to ensure consistency in comparison.

Background imageBackground image
Background imageBackground image
Blog

Accuracy benchmark of Salad Transcription API

A detailed blog about the methodology, workflow and results

DATASETS

3 public datasets across
multiple speech types

The benchmark was performed on 3 different public datasets, covering speech types from phone calls to presentations.

1
Common Voice

Large-scale, open-source collection of Multilingual data, created by Mozilla to improve speech recognition technologies.

Over 1M files & 1.5k+ audio hours.

Background
2
TED-LIUM

A collection of transcribed audio recordings from TED Talks with transcriptions sampled at 16 kHz.

118 to 452 hours of speech data.

Background
3
Meanwhile

A collection of audio clips from The Late Show with Stephen Colbert, published as part of the Whisper release by OpenAI.  

64 audio segments.

ACCURACY RATE

Best across 8 languages

Salad Transcription API scored the highest accuracy rate on average across 8 major languages in the benchmark.

#
lANGUAGE
COMMON VOICE
TED-LIUM
MEANWHILE
#1
English ENG
95.1%
95.8%
95.7%
#2
Spanish ES
96.8%
-
-
#3
Portugese PT
92.0%
-
-
#4
Italian IN
93.3%
-
-
#5
French FR
92.0%
-
-
#6
German DE
96.3%
-
-
#7
Russian RU
96.3%
-
-
#6
Hindi HI
84.0%
-
-
WORD ERROR RATE (WER)

Lowest WER for English

Salad Transcription API scored the lowest Word Error Rate (WER) for English in the industry, comfortably beating out other APIs for 40% less cost.

Dataset
Salad Transcription API
Salad Transcription Lite
Assembly AI Universal
Amazon Transcribe
Google Latest Long
Azure Batch v3.1
Deepgram Nova 2
OpenAI Whisper
Common Voice
4.90%
18.70%

6.67%

8.98%17.59%7.81%12.43%8.83%
Meanwhile
4.30%
16.70%

4.77%

7.27%11.67%6.73%5.56%9.75%
TED-LIUM
4.20%
8.20%

7.21%

9.12%11.69%9.27%8.98%7.30%
Cut transcription costs by up to 90%

Get in touch with Sales for discounted pricing

Save even more for high-volume transcription.

Lowest market price
Best accuracy
Ease of use
Variant