Tech · Hugging Face
Adding Benchmaxxer Repellant to the Open ASR Leaderboard
Compiled by KHAO Editorial — aggregated from 1 outlet. See llms.txt for citation guidance.
★ Tier-1 Source
TLDR: Appen Inc. and DataoceanAI have provided high-quality English ASR datasets covering scripted and conversational speech over multiple accents.
Key facts
- Since its launch in September 2023, the Open ASR Leaderboard has been visited over 710K times
- TLDR: Appen Inc. and DataoceanAI have provided high-quality English ASR datasets covering scripted and conversational speech over multiple accents
- To this end, they have worked with Appen Inc. and DataoceanAI to curate high-quality datasets for ASR benchmarking
- They're not updating the average WER now: by default, the leaderboard’s Average WER remains computed on public datasets only
Summary
They're not updating the average WER now: by default, the leaderboard’s Average WER remains computed on public datasets only. Since its launch in September 2023, the Open ASR Leaderboard has been visited over 710K times. Two words sum up the objectives (but also challenges) in maintaining a benchmark like the Open ASR Leaderboard:. Standardization and openness are essential for meaningful benchmarking, but they also make benchmarks more susceptible to benchmark-specific optimization ("benchmaing"), where models improve leaderboard performance without corresponding gains in real-world robustness. As discussed in their report, there is no single "catch-all" ASR model: some perform better on American English, others on diverse accents and multilingual settings, while others are optimized for speed or conversational audio.