1. Forum
  2. >
  3. Topic: Russian
  4. >
  5. Why not use a human recording?


Why not use a human recording?

I like the Russian course so far, but one thing that disappoints me is their choice to use a TTS instead of a human voice.

The first problem with the TTS is that it doesn't always produce the correct sound. It pronounces things too literally or stresses the wrong syllable. This hurts because for many learners the TTS is the first thing they hear, so the TTS teaches users a wrong accent that later needs to be unlearned. This hurts for Russian in particular since this is a language that is already hard to pronounce for people whose native language is a West-European one. It even leads to confusing about the real pronuncation as can be observed in this discussion.

I don't think the costs of letting a human speak are much of a problem. The Ukrainian course managed to incorporate a very nice human voice in their course. Also many of the Russian words have even already been recorded by course moderator Shady_arc on forvo.com and could have been incorporated in the course directly. It might even have saved some time, because seemingly it took about a month to even buy this TTS, enough time to do all the necessary voice recordings.

The thing that bothers me the most, is that a TTS seems to be the default. Seemingly, the only reason Esperanto, Irish and Ukrainian are using a human voice is because they couldn't find a half-decent TTS. In my opinion this should really be the other way around. Courses should have a human voice by default and only use a TTS as a fallback option.

November 6, 2015



That would have delayed the course for another month and I myself would die over the anxiety. I have waited for this course for about two years now.


Wouldn't it be better to wait and learn it properly than to rush and learn it incorrectly? Better late than never.


As a Russian I can say the TTS is fine, pronunciation is clear and accents are correct.


True, but the voice is pretty good in my opinion.


Duolingo should not strive for "pretty good". Especially not if a human voice would be perfect.


You hold fairly high standards for something that is free.


Duolingo prefers not to use human voices wherever possible. Text-to-Speech solutions are better scalable and way cheaper. By the way, the Russian course has more than 60–70 hours of audio, including words and turtle-speed pronunciation. That would sure take quite some time recording. This is the reason the voiceover is quite limited in the courses that use real speakers.


Thank you for your response. The tree that you and your team created is indeed quite huge.

For the costs, I think storage of 70 hours of audio is doable within 1GB, bandwidth costs would probably be about the same as with a TTS, so I imagine most of the costs to be in the actual recording of the audio.

I can imagine that it's painful to change sentences after the recordings have taken place. It would probably require some person to do new recordings once a week of newly and changed sentences which would have to be disabled in the meantime. I don't know how this is handled by other courses.

I think turtle speed can still be done with a human voice automatically by just playing it slower and normalising the pitch. Personally, I try to get rid of the turtle-speed pronunciation as soon as possible anyway, so even if done by hand it might only be needed for the first couple of lessons.

Thanks again for creating this course, by the way.


how about you learn how to say "я разочаровывающий" to work this frustration out of your system. by the time you learn how to say it, you won't be worrying so much about the tts. the russian sound system is actually pretty simple. it's not difficult because of the individual sounds, but because of the way they're put together into these absurdly long words. i swear, the average length of a russian word is 5 syllables.


That's a bit what I'm worried about. How will I figure out which syllables are stressed in long words like these? Anyway, I'll stick with simpler sentences for now.

I have to admit that compared to the Google translate voice, this voice sounds reasonably human.


You can always use a dictionary.


Well if they were to use a human voice it may be best to either have someone who has a voice that is fairly neutral between masculine and feminine sounding if possible. Otherwise alternative between male and female. Why I say this is because its not always easy to figure out how to pronounce something if your own voice is fairly different from a speaker as that makes yourself unsure if you are doing an accent or doing an impression of someone instead. I am not sure if its my imagination but Russia males and females sound pretty different to me in how they talk. Is it just me?


Yes, that's another thing. In my opinion separate masculine and feminine voices would be preferable, but also more expensive. The French course has separate masculine and feminine voices, which although still computer generated, are a nice addition to have.


A real human recorded voice would be awesome, but they would have to record it for pretty much every sentence not every word, as I am sure the TTS uses a database of multiple recordings for each word and places the most appropriate word according to the joining words. Is is in order to make the sentence sound natural, either that or each word is recorded in a mural way without trying to express it. It would also take a long time, and these guys have already been working on the course for years, not sure if they would want to add human recorded voice. But I do agree that it would be better for the learner.


Exactly. My understanding was that duolingo sentences are randomly generated per user, which is why we see the "Weirdest sentence you've had?" threads.

Am I right about that?


No. Sentences are weird because the structure is more important than the meaning, and strange/funny meanings are more likely to stick the structure in your brain while also allowing a greater density of vocabulary.

Automatic sentence generation is actually quite difficult and is not in any way appropriate for teaching a new language, since it will often produce incoherent results. Google "subreddit simulator" for an example.


Subreddit Simulator doesn't attempt generating sentences, it's just a dumb Markov chain.

Sentence generation is possible, but it's hard to do it right and the payoff is minuscule.


Well, sure, if you want to try REALLY hard. But it's not worth it.


Shady_arc provides audio for my memrise course here http://www.memrise.com/course/378212/duolingo-russian-full-audio/ Of course, it's not possible to include every sentence in the course, but it does help you know how to stress the words.


The audio on that course is brilliant but on here the benefits of the tts being available for every word outweigh that I think. I know I have found the courses without it far harder and find myself guessing at pronunciation even after finishing, wheras with tts I feel much more confident about that.

But the memrise course is a brilliant accompaniment to the duolingo course. Well done and thanks to both cherub and shady arc for all their work on it.


Please, please keep the memrise course going. I reviewed it and loved it! If there was audio for everything it would be the single best Russian course on memrise.

A trillion thanks to Shady_arc for providing the audio, it's clear and easy to hear. Hope she keeps it up!


The Irish course uses a real human voice. While it sounds good, there is a problem with it: not all of the sentences and words are available to listen to.

The voice is okay. There are a few words that I've heard where I say "that doesn't sound the way that the Russians that I've listened to say it" but for the most part it's fine. I've said the same thing for some of the German words too, though.


I agree. The Ukrainian voice is 2000000000% better.


But the Ukrainian voice is 200% more quiet :-(

Learn Russian in just 5 minutes a day. For free.