BENCHMARK

Transcription API Accuracy

See how Salad Transcription API scored the highest accuracy rate in this recreation of a competitor benchmark.

Try the API for free

Talk to Sales

Best accuracy rate for English

Read the benchmark blog

95.1%

Assembly AI

93.4%

OpenAI

92.1%

Azure Batch

91.2%

Deepgram

91.0%

Google STT

90.8%

Results from a benchmark on CommonVoice 5.1 dataset replicating the Assembly AI benchmark. Read the benchmark results here.

process

Benchmark

Methodology

We selected several datasets that contain original audio files along with validated ground truth transcripts. The audio files were uploaded to Salad S4 Storage and processed through both Salad Transcription API and Transcription Lite to obtain their transcriptions.

To ensure consistency in evaluation, both the predicted transcripts and the ground truth were normalized using the open-source Whisper Normalizer: https://pypi.org/project/whisper-normalizer/. This step helps remove discrepancies caused by variations in punctuation, capitalization, and formatting.
‍
For accuracy assessment, we calculated Word Error Rate (WER) between the ground truth and the transcribed outputs. The WER was calculated using the JiWER library: https://pypi.org/project/jiwer/. The average WER was then determined for each dataset. The entire benchmark process was replicated from the Assembly AI benchmark of various API providers to ensure consistency in comparison.

Blog

Accuracy benchmark of Salad Transcription API

A detailed blog about the methodology, workflow and results

Read the blog

DATASETS

3 public datasets across
multiple speech types

The benchmark was performed on 3 different public datasets, covering speech types from phone calls to presentations.

Common Voice

Large-scale, open-source collection of Multilingual data, created by Mozilla to improve speech recognition technologies.

Over 1M files & 1.5k+ audio hours.

View dataset

TED-LIUM

A collection of transcribed audio recordings from TED Talks with transcriptions sampled at 16 kHz.

118 to 452 hours of speech data.

View dataset

Meanwhile

A collection of audio clips from The Late Show with Stephen Colbert, published as part of the Whisper release by OpenAI.

64 audio segments.

View dataset

ACCURACY RATE

Best across 8 languages

Salad Transcription API scored the highest accuracy rate on average across 8 major languages in the benchmark.

#	lANGUAGE	COMMON VOICE	TED-LIUM	MEANWHILE
#1	English ENG	95.1%	95.8%	95.7%
#2	Spanish ES	96.8%	-	-
#3	Portugese PT	92.0%	-	-
#4	Italian IN	93.3%	-	-
#5	French FR	92.0%	-	-
#6	German DE	96.3%	-	-
#7	Russian RU	96.3%	-	-
#6	Hindi HI	84.0%	-	-

WORD ERROR RATE (WER)

Lowest WER for English

Salad Transcription API scored the lowest Word Error Rate (WER) for English in the industry, comfortably beating out other APIs for 40% less cost.

Dataset	Salad Transcription API	Salad Transcription Lite	Assembly AI Universal	Amazon Transcribe	Google Latest Long	Azure Batch v3.1	Deepgram Nova 2	OpenAI Whisper
Common Voice	4.90%	18.70%	6.67%	8.98%	17.59%	7.81%	12.43%	8.83%
Meanwhile	4.30%	16.70%	4.77%	7.27%	11.67%	6.73%	5.56%	9.75%
TED-LIUM	4.20%	8.20%	7.21%	9.12%	11.69%	9.27%	8.98%	7.30%

Cut transcription costs by up to 90%

Get in touch with Sales for discounted pricing

Save even more for high-volume transcription.

Talk to Sales

Lowest market price

Best accuracy

Ease of use

Best accuracy rate for English

Assembly AI

OpenAI

Azure Batch

Deepgram

Google STT

Benchmark

Methodology

Accuracy benchmark of Salad Transcription API

3 public datasets across multiple speech types

Common Voice

TED-LIUM

Meanwhile

Best across 8 languages

Lowest WER for English

Get in touch with Sales for discounted pricing

3 public datasets across
multiple speech types