Résumé
This paper provides a descriptive qualitative and quantitative study of the deaccenting of given information, a.k.a. anaphora rule, by four well-known online TTS software (Murf, Lovo, Play.ht and Replica Studios). We have used 10 lines as input, each containing elements of given information to test the software. The voice types selected for our analysis are one male with British English accent and one female with American English accent for each software. Each line has been uttered by the voice skins in each software, downloaded in audio format and analysed using the speech analysis software Praat. This way we can measure and evaluate the pitch contours for each utterance and check whether the anaphora rule is applied or not by the different TTS software. The general results show that almost 70% of the lines do not achieve the delivery of the anaphora rule. This means that this prosodic feature characteristic of English stress and the substantial pragmatic load it carries is lost most of the times. The results obtained indicate that despite the fact that synthetic voices may be successful at segmental level in terms of catenation and voice quality, the suprasegmentals and prosodic elements of human speech are not mastered by the machines yet.
Références
Agüero, P. D., Bonafonte, Antonio C. (2003). Phrase break prediction: a comparative study. Procesamiento del lenguaje natural, 31, 107-114. http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/3156
Ashby, P. (2011). Understanding Phonetics (Understanding Language Series). (B. Comrie, & G. Corbett, Eds.) London: Hodder Education.
Austin, J. L. (1962). How to do things with words. Oxford: Oxford University Press.
Collins, B., Mees, I. M. (2013). Practical Phonetis and Phonology. A resource book for students. Oxon, United Kingdom: Routledge.
Cruttenden, A. (2014). Gimson's Pronunciation of English (Vol. 8). Oxon (UK) and New York (USA): Routledge.
Dutoit, T. (1997). An Introduction to Text-to-Speech Synthesis. Mons, Belgium: Springer Science+Business Media Dordrecht.
Dutoit, T. (1997b). High-quality text-to-speech synthesis : an overview. Journal of Electical and Electronics Engineering Australia, 17(1), 26-36.
Estebas-Vilaplana, E. (2014). Phonologocal models of intonational description of English. In R. Monroy-Casa, & I. Arboleda-Girao, Readings in phonetics and phonology (pp. 231-260). Valencia: IULMA (Institut Universitary de Llengües Modernes Aplicades) Universitat de Valencia.
Halliday, M. (1967). Notes on transitivity and theme in English, Part 2. Journal of Linguistics(3), 199-244. https://www.jstor.org/stable/4174965
Hatim, B., Mason, I. (1990). Discourse and the translator. Longman.
Hirschberg, J. (2006). Pragmatics and Intonation. In L. R. Horn, & G. Ward, The Handbook of Pragmatics. Oxford: Blackwell Publishing.
Ilona, K., Gábor, O., Péter, O. (2000). Prosody Prediction from Text in Hungarian and its Realization in TTS Conversion. International Journal Of Speech Technology, 3, 187-200. https://link.springer.com/article/10.1023/A:1026519300902
Klatt, D. H. (1987). Review of text-to-speech conversion for English. The Journal of the Acoustical Society of America, 82(3). https://doi.org/10.1121/1.395275
Mateo, M. (2014). Exploring pragmatics and phonetics for successful translation. (VIAL) Vigo International Journal of Applied Linguistics (11), 111-135. https://revistas.uvigo.es/index.php/vial/article/view/64
Cohen, J. P., Giangola, J.; Balogh, J. (2004). Voice User Interface Design. Addison-Wesley Professional.
Mott, B. (2011). English phonetics and phonology for Spanish speakers. Barcelona: Publicacions i Edicions de la Universitat de Barcelona.
Prince, E. F. (1981). Toward a taxonomy of given/new information. In P. Cole (ed.), Syntax and Semantics: Vol 14. Radical Pragmatics (pp. 223-255). Academic Press, New York. 223-255. https://www.bibsonomy.org/bibtex/155380875b4a6da56d8a4fdf1f8e19f30/cbrewster
Tan Xu, C. J.-Y. (2022). NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality. arXiv. https://arxiv.org/abs/2205.04421
Taylor, P. (2009). Text-to-Speech Synthesis. Cambridge: Cambridge University Press.
Tench, P. (2009). The Pronunciation of Grammar. 3rd International Congress on English Grammar. Salem, TN, India: Sona College of Technology. https://www.paultenchdocs.co.uk/wp-content/uploads/2013/08/pronunciation_of_grammar.pdf
Wells, J. (2006). English Intonation: an introduction. Cambridge: Cambridge University Press.
Yule, G. (1996). Pragmatics. Oxford: Oxford University Press.

Ce travail est disponible sous licence Creative Commons Attribution - Pas d'Utilisation Commerciale - Pas de Modification 4.0 International.

