1. Forum
  2. >
  3. Topic: Duolingo
  4. >
  5. How much would it cost to rep…


How much would it cost to replace all the robot voices with human recordings?

A couple of languages have human recordings (including, I believe, Klingon eventually) and the rest are stuck with the robots with their incorrect intonation, mispronunciations, and unpleasantly fake voices. If Klingon is going to get recorded voices, why are real languages (that this website was made for) not allowed decent audio? Hearing how things users are learning are pronounced by real people is 10000000000000x more important than anything being trailed in labs, the new web design, the app's bots, or anything else Duolingo is doing at the moment. Should it not be a priority?

For the record, I am looking forward to trailing that Klingon course, I'm not saying it doesn't deserve a place on Duolingo.

September 28, 2017



Duolingo claim that the results are better with TTS. I can't remember if it is user engagement, learning outcomes, or both. Anyway, they do not see it as the second rate option.

Also it is worth noting that TTS is more flexible. Define a new sentence, and it automatically gets audio. No having to arrange additional recording sessions, and manage the audio files etc.


I've seen that claim, too. I can believe the result for the bigger language courses I think. For instance, I would say the Portuguese TTS is really quite good. But for some of the smaller languages: Greek and Turkish come to mind, it's much more grating. Much of the time one only really needs the slow option b/c the TTS is so bad. I remember hardly a single case in the Guaraní tree where I would have felt a desire to slow anything down. Once in a while an individual word by itself, yes, but if they wanted they could just have the people doing the recordings record the words individually as well.


I hate that their studies don't consider minor issues with TTS, like being abloe to read homographs properly

-live (verb) vs live (adjective)

  • read (present tense), read (past tense)

  • present (verb), present (noun)

  • record (verb), record (noun)

. I've played through some reverse courses and they play the wrong pronunciation in English.

TTS can't understand homographs when its based on context and grammar from time to time!!


Sure. I do wonder if the reason their results are good for TTS, is that it allows for the turtle button (speak slowly option).


That's the problem with studies. They only verify that things are true in certain conditions. That's why in science new hypotheses are set up to close any open holes in what we already know. But this can often lead to finding out that there are more holes in knowledge to plug.

In this case, if you know ONE homograph and the TTS says that ONE homograph the SAME way ALL the time, of course you'll know how to write it just by listening to it. Their studies just proved that people can understand how to write what the TTS can read. It doesn't prove that the TTS knows how to read nor if it reads the text properly.


You probably do Duolingo too much credit to assume they were testing learning mechanics on the same plane as their more heavily weighted outcomes of utilization. I don't know that their learning outcome measures attain the granularity you imply here, particularly not for any given test.


TTS voices have slow speed and individual words, which are certainly very useful to learners. In an ideal world, perhaps we could have both real recordings for every sentence and TTS for words and slow sentences.
However, it is worth bearing in mind that TTS technology is rapidly improving. It is eminently possible that within five years or so there will exist artificial voices that are indistinguishable from a real person by a native speaker. Duolingo does update and improve its electronic voices as time goes by, and I'm sure it will continue to do so.


I don't think Duolingo uses the best ones generally available though. I'm pretty good in French; I've heard it spoken plenty. I did the French course on lingvist for a bit of additional vocab learning. I didn't figure out it was TTS until I read it in a post on their website. I don't think the French TTS here is bad, but it's not that good.


The newest technology always costs a premium, but it tends to trickle down pretty quickly over two or three years. DL has limited resources and a lot to spend them on, but it does improve. As you say above that the PT TTS is quite good (I get both male and female voices on the app, but not on the web), I dare say you also remember the dalek that preceded it.
I'm willing to have a little patience for a free service; that two or three years would have represented a century of progress not all that long ago.


I up-voted! :-)

Mondly https://www.mondlylanguages.com/ has decent Portuguese Brazil sound (does not sound like TTS, so probably this is native speaker recorded).
I can not speak for any other language as I have not tested the other courses.

The Memrise DuoLingo PT BR course switched to forvo.com audio on higher levels.
And there are often 3-4 audio recordings.


The English bot is quite bad. For example, it's unable to differentiate between different pronunciations of "live", "present", "read", et cetera. It's leading those trying to learn English astray.


Yes, it's unable to distinguish homographs. I hate it.


In addition to homographs, it also is unable to distinguish homonyms.


Hungarian and Esperanto have real speakers and altough its sometimes difficult to understand (because there is no slow speed) they are far more pleasant voices than TTS in spanish.


I definitely attribute to the real speakers in Hungarian and Guaraní at least part of why I was so drawn in to the courses.


Yes, it'll show when we go to Turkey and start to speak the Turkish we learned on here, the locals will easily notice that we all have a robotic accent..


It's not really a matter of cost, as everything is done by volunteers, but as a voice recorder for the Klingon course I can tell you it's not necessarily better. Klingon has at least one reasonable TTS engine, but it's not being used for the course. Meanwhile the humans recording sometimes mumble, get microphone crackle, cough, fart, have background noises, and have to record every sentence and then every word in every sentence separately. It's going to take a year to get audio into one tree, even before we go back and determine which "audio sounds incorrect" complaints reflect genuine problems, and which are just people who can't believe the sounds in tlhuHQo' constitute a word.


It's not really a matter of cost, as everything is done by volunteers

The facility for volunteers to add audio is a very recent innovation that postdates this post's date. Most of the real, human audio in DL courses was commissioned from professional voice-actors, so cost was undoubtedly a factor.
I don't know whether all paying for audio has now been abandoned in favour of volunteers' providing it, or whether this is just a stop-gap for languages (like Klingon) for which there are few or no 'professional speakers'.

Learn a language in just 5 minutes a day. For free.