Known pronunciation errors in the TTS
- vakt is pronounced wrong when said separately. En vakt sounds right, but if you just listen to the word vakt itself, it's said wrong. https://www.duolingo.com/comment/6873143 (vakt should always be pronounced with a short a)
- vita is said more like 'veta' in Gardinerna är vita och svarta https://www.duolingo.com/comment/5769886
- svart is wrong at least in Filmen skulle precis börja när allt blev svart https://www.duolingo.com/comment/6688426 (svart should always be pronounced with a short a)
- kvart is wrong in e.g. De kommer om en kvart https://www.duolingo.com/comment/5729675 (kvart should always be pronounced with a short a) (slow version gets it right)
- svans in Lejonet sitter på sin svans is incorrectly pronounced (long a instead of short) (this word pronounced that way means swan's).
- sporten is wrong in https://www.duolingo.com/comment/7408189 (sporten should always be pronounced with a short o).
- sport is also wrong: https://www.duolingo.com/comment/8053309 (sport is not pronounced like in English, it should have a short o).
- kör is wrong in kör långsamt https://www.duolingo.com/comment/6883827 (this word pronounced that way means choir)
- väl is pronounced wrong in Vi blev väl behandlade https://www.duolingo.com/comment/8259639, see explanation in the sentence forum there.
- el is pronounced as eller liknande ('or similar': the TTS thinks it is an abbreviation) https://www.duolingo.com/comment/6587130
- tunnelbanan is pronounced as if it was a composite word made up of tunnel (which is true) and banana (which is not true) here: https://www.duolingo.com/comment/7307632
- planet sounds wrong in Forskarna upptäckte en ny planet. (planet pronounced that way means the plane)
- gamla is pronounced slightly wrong in e.g. Hennes böcker är älskade av både unga och gamla.
- senapen is pronounced wrong in https://www.duolingo.com/comment/6188903
- lade is pronounced wrong in https://www.duolingo.com/comment/7074698 This word should be said as if it were spelled la.
- the stress is wrong in bry sig om: https://www.duolingo.com/comment/7221576
- det is erroneously said as dom in Det var roligt. https://www.duolingo.com/comment/7228391
- och is said wrong in Hon och han https://www.duolingo.com/comment/5993718 – but generally the TTS gets this word right
- jag is said wrong in Jag hoppas att han ger mig ett svar https://www.duolingo.com/comment/7537146 (the slow version is correct and the tts generally gets this word right)
- poolen is pronounced like Polen in Poolen är inte djup. https://www.duolingo.com/comment/7246443 Polen = Poland but poolen = the pool.
- krabba sounds like 'grabba' in En krabba: https://www.duolingo.com/comment/6497061
- kon sounds like gon i Kon dog https://www.duolingo.com/comment/7310195 (but the slow version is correct)
- duken is said with a weird pause in Lägg duken på bordet! https://www.duolingo.com/comment/7731611
- smörgåsar also has a weird pause in Flickan har smörgåsar https://www.duolingo.com/comment/5660793
Thanks for compiling this list, Arnauti! Some are a mystery to me (det=dom in this one case), and some are understandable (kör and kör are two different words, and we haven't had any luck in finding a TTS that can say "read" and "read" correctly either). It is hard to find a perfect voice, but this one makes fewer mistakes than the previous one, huh? =]
It's much, much better than the previous one. eller liknande for el isn't good, but the other one said - out loud as bindestreck, so… The new one only has rare errors on less common words, whereas the old one made mistakes on some very common ones. I think especially de and bakom were unforgivable errors of the old voice.
And thank you vivisaurus for helping us get this new voice in place!
PS I didn't compile the list all on my own, we've been doing it together in the internal wiki.
I like the idea. But even if that is possible, I doubt its feasibility as a long-term solution. TTS systems tend to generate the speech on-the-fly (with caching to prevent unnecessary calculations), meaning that virtually any word can change when an update is applied to the system. And it may well be that the word is pronounced correctly when used in conjunction with other words (not including the el error here), since the correct pronounciation by necessity depends on syntactic analysis as well as prosodic information, both of which have to be derived contextually.
Thanks for commenting. Is the text input reparsed, or double parsed in any way? The det -> dom case would make sense if the det was converted into de in any way, either through text shortening or because det is pronounced de - which is itself pronounced dom.
Neither reason sounds particularly plausible, and the Occam explanation would be that it's simply an odd mistake. Just throwing it out there. All in all, a major improvement if you ask me. :)
That's interesting! It could be a problem with words like "kör", "banan" and "planet" though, since the faulty pronunciations actually exist but mean something else.
The old voice had the right pronunciation for choir (kör) but not for drive (kör) and for the new one it's the other way around. I guess it's complicated to make the computer voice able to choose the right one for both cases.