dx.doi.org/10.1109/ICASSP49660.2025.10888138

Preview meta tags from the dx.doi.org website.

Linked Hostnames

2

Thumbnail

Search Engine Appearance

Google

https://dx.doi.org/10.1109/ICASSP49660.2025.10888138

Speech Data Selection for Efficient ASR Fine-Tuning using Domain Classifier and Pseudo-Label Filtering

In real-world speech data processing, the scarcity of annotated data and the abundance of unlabelled speech data present a significant challenge. To address this, we propose an efficient data selection pipeline for fine-tuning ASR models by generating pseudo-labels using WhisperX pipeline and selecting efficient labels for fine-tuning. In our work, we propose a domain classifier system developed with a computationally inexpensive TFIDF and classical machine learning algorithm. Later, we filter data from the classifier output using a novel metric that assesses word ratio and perplexity distribution. The filtered pseudo labels are then used for fine-tuning standard encoder-decoder Whisper models and Zipformer. Our proposed data selection pipeline reduces the dataset size by approximately 1/100<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">th</sup> while maintaining performance comparable to the full dataset, outperforming random domain-independent selection strategies.



Bing

Speech Data Selection for Efficient ASR Fine-Tuning using Domain Classifier and Pseudo-Label Filtering

https://dx.doi.org/10.1109/ICASSP49660.2025.10888138

In real-world speech data processing, the scarcity of annotated data and the abundance of unlabelled speech data present a significant challenge. To address this, we propose an efficient data selection pipeline for fine-tuning ASR models by generating pseudo-labels using WhisperX pipeline and selecting efficient labels for fine-tuning. In our work, we propose a domain classifier system developed with a computationally inexpensive TFIDF and classical machine learning algorithm. Later, we filter data from the classifier output using a novel metric that assesses word ratio and perplexity distribution. The filtered pseudo labels are then used for fine-tuning standard encoder-decoder Whisper models and Zipformer. Our proposed data selection pipeline reduces the dataset size by approximately 1/100<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">th</sup> while maintaining performance comparable to the full dataset, outperforming random domain-independent selection strategies.



DuckDuckGo

https://dx.doi.org/10.1109/ICASSP49660.2025.10888138

Speech Data Selection for Efficient ASR Fine-Tuning using Domain Classifier and Pseudo-Label Filtering

In real-world speech data processing, the scarcity of annotated data and the abundance of unlabelled speech data present a significant challenge. To address this, we propose an efficient data selection pipeline for fine-tuning ASR models by generating pseudo-labels using WhisperX pipeline and selecting efficient labels for fine-tuning. In our work, we propose a domain classifier system developed with a computationally inexpensive TFIDF and classical machine learning algorithm. Later, we filter data from the classifier output using a novel metric that assesses word ratio and perplexity distribution. The filtered pseudo labels are then used for fine-tuning standard encoder-decoder Whisper models and Zipformer. Our proposed data selection pipeline reduces the dataset size by approximately 1/100<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">th</sup> while maintaining performance comparable to the full dataset, outperforming random domain-independent selection strategies.

  • General Meta Tags

    12
    • title
      Speech Data Selection for Efficient ASR Fine-Tuning using Domain Classifier and Pseudo-Label Filtering | IEEE Conference Publication | IEEE Xplore
    • google-site-verification
      qibYCgIKpiVF_VVjPYutgStwKn-0-KBB6Gw4Fc57FZg
    • Description
      In real-world speech data processing, the scarcity of annotated data and the abundance of unlabelled speech data present a significant challenge. To address thi
    • Content-Type
      text/html; charset=utf-8
    • viewport
      width=device-width, initial-scale=1.0
  • Open Graph Meta Tags

    3
    • og:image
      https://ieeexplore.ieee.org/assets/img/ieee_logo_smedia_200X200.png
    • og:title
      Speech Data Selection for Efficient ASR Fine-Tuning using Domain Classifier and Pseudo-Label Filtering
    • og:description
      In real-world speech data processing, the scarcity of annotated data and the abundance of unlabelled speech data present a significant challenge. To address this, we propose an efficient data selection pipeline for fine-tuning ASR models by generating pseudo-labels using WhisperX pipeline and selecting efficient labels for fine-tuning. In our work, we propose a domain classifier system developed with a computationally inexpensive TFIDF and classical machine learning algorithm. Later, we filter data from the classifier output using a novel metric that assesses word ratio and perplexity distribution. The filtered pseudo labels are then used for fine-tuning standard encoder-decoder Whisper models and Zipformer. Our proposed data selection pipeline reduces the dataset size by approximately 1/100<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">th</sup> while maintaining performance comparable to the full dataset, outperforming random domain-independent selection strategies.
  • Twitter Meta Tags

    1
    • twitter:card
      summary
  • Link Tags

    9
    • canonical
      https://ieeexplore.ieee.org/document/10888138/
    • icon
      /assets/img/favicon.ico
    • stylesheet
      https://ieeexplore.ieee.org/assets/css/osano-cookie-consent-xplore.css
    • stylesheet
      /assets/css/simplePassMeter.min.css?cv=20250308_00000
    • stylesheet
      /assets/dist/ng-new/styles.css?cv=20250308_00000

Links

17