EVALUATING SYNTHESIZED SPEECH QUALITY
DOI: 10.31673/2786-8362.2024.022953
DOI:
https://doi.org/10.31673/2786-8362.2024.022953Abstract
The article explores
methods for objectively evaluating the quality of synthesized speech that can be used to assess the accuracy
and performance of speech generation systems. This evaluation approach aims to provide an unbiased
measure of the improvement of the quality of synthesized speech, independent of the subjective opinions
of listeners, thus facilitating the design and improvement of speech synthesis systems. It also offers a
framework for comparing different speech synthesis systems to determine which one performs better.
Considerable attention is paid in the article to the use of neural networks as a tool for evaluating the quality
of the output of other neural networks, emphasizing the potential of self-evaluation in artificial intelligence
systems. The effectiveness of the existing quality control systems is thoroughly investigated with the
determination of their strengths and weaknesses. In addition, the paper highlights the key advantages of
each evaluation system available, contributing valuable information to the continuous improvement of
speech synthesis technologies.
Keywords: neural network, synthesized speech, evaluation metrics, metrics.
List of used literature:
1. Frederick Jelinek. Statistical Methods for Speech Recognition. MIT Press, 1997. – p. 300–320
2. Lawrence R. Rabiner, Ronald W. Schafer. Digital Processing of Speech Signals. Prentice Hall,
1978. – p. 215–250
3. Richard M. A. Monro, Nicolas Stoll. Perceptual Evaluation of Speech Quality (PESQ): The
New ITU Standard for End-to-End Speech Quality Assessment, Part I. – p. 98–112
4. Chin-Hui Lee, Frank K. Soong, Kuldip K. Paliwal. Automatic Speech and Speaker
Recognition: Advanced Topics. Springer, 1996. – p. 185–210
5. Lawrence R. Rabiner, Ronald W. Schafer. Introduction to Digital Speech Processing. Now
Publishers Inc, 2007. – p. 35–50
6. Francesco Camastra, Alessandro Vinciarelli. Machine Learning for Audio, Image and Video
Analysis. Springer, 2015. – p. 145–170
7. Stephan Raaijmakers. Deep Learning for Natural Language Processing. Manning Publications,
2022. – p. 300–325
8. Christopher Bishop. Pattern Recognition and Machine Learning. Springer, 2006. – p. 410–440
9. Paul Taylor. Text-to-Speech Synthesis. Cambridge University Press, 2009. – p. 120–145
10. Daniel Jurafsky, James H. Martin. Speech and Language Processing (3rd Edition). Pearson,
2023. – p. 500–525