This was posted on the Irish course page: http://incubator.duolingo.com/courses/ga/en/status by Lancet yesterday:
"Haigh, a lucht leanúna na Gaeilge!
We know that loads of you are champing at the bit to get started on our course. Rest assured that we're doing everything we can to bring Irish to beta as soon as possible.
The main issue delaying us at the moment is the implementation of the audio exercises. All other Duolingo courses up until now have used text-to-speech (TTS) software engines to automatically generate the required audio. Given Irish's status as a minority language, however, there just aren't many such engines available to choose from. The Duolingo tech team are looking at a lot of options for this and, in particular, they are in the middle of trialling one option which we in Team Irish hope to be able to hear samples of in the coming week.
In the meantime, we're using the Incubator's internal workshop to make systematic checks and fixes to the course, and to check that all possible translations are accepted for certain words or phrases. This will make it much less annoying you, but we will still need the course end-users (that is, you!) to report when you come across valid answers that the system doesn't accept.
Release date: when it's done ;)
Le grá agus meas,
I'm really looking forward to this course!
Hi Jack, and everyone else :) We have some further updates :) Last night we were able to hear some of these samples, and wow, we were impressed! We are waiting for everyone on the team to chime in before Duolingo moves on to the next step. In the words of our mentor: "I can see this course going into beta soon". I think it's time to get excited again :P In the mean time, we are adding many many alternatives to each sentence to cut down on our reports when we are in beta. If you guys have any more questions, feel free to post on my stream. (Please don't ask for a release date, because we don't have one).
Note: We will not be sharing the samples mentioned above, but you will see them....or....well....hear them, very soon :)
Wow, I did not expect such positive news about the TTS so soon. Getting excited again. Keep up the good job, I can't waitttt! :D
Great news. This does make me wonder. Will the other incubating languages run into this problem as well? Some of them are not exactly major languages (Hungarian, Romanian, Turkish). But this is great news about Irish. Like everyone else, I can't wait. I remember hearing in university how endangered the language was, and how my generation would see it go the way of Latin. If they only knew the multitude of people eager to learn this language!
I wouldn't put Turkish into quite the same category; while not on par with languages like Spanish or English, it's a major and important language in the region, as well as among the large diaspora in other European countries. But yeah, there could definitely be problems for smaller languages.
I have to say I'm pretty disappointed with how Duolingo (not Team Irish) handled this situation - it's not like they only just discovered that TTS would be needed. Why wait for the whole course to be constructed before attempting to find an appropriate TTS? Wouldn't it be logical to consider this important function right from the start? :/
Anyway, I hope this time "soon" will really be soon!! :D
@TanagerMoonmist I understand that it can be frustrating, but I don't think Duolingo expected the course to be completed so quickly. I think this is the same issue that Dutch had, because they released without the TTS they planned on using.
I think we should all take some time to think about all the great things Duolingo has done (e.g. letting our course be incubated, bringing language education to the world for free) and then we'll all see how the good out-weighs the bad :)
Updates are coming to Team Irish daily, direct from our Incubator mentor, who is part of Duolingo staff, and judging by these updates, soon means soon.
How was that a confirmation of the meaning of soon? He literally said "soon means soon." Three or four weeks ago they were super close to launch, then two weeks after that incredibly close. I hate to be a downer, and I know they've put a ton of work into it, but so far any kind of time estimate has been hilariously inaccurate.
I am sure Hungarian, Romanian, and Turkish have some TTS options available, even if they aren't great. Some users complain that German and Italian's TTS on Duo isn't fabulous, but even if these languages don't have the wealth of options that English and Spanish have, they are many people's sole language. In the case of endangered and extinct languages, like Cherokee and Gothic, however, or constructed languages like Dothraki and Klingon, Duo may have to come up with more creative options.
For example, using another language's TTS and inputting the pronunciation independent of the spelling, which would only work if the TTS of a language contains all of the sounds of the target language. If Spanish, for example, had all the sounds of Dothraki (I have no idea if it does) you might be able to "cheat" the TTS by entering the pronunciation of Dothraki words the way they would be pronounced if they were Spanish, then connecting it on the back end, independant of the actual Dothraki spelling. The user would not see this and would go on learning the correct Dothraki spelling (if there is one!).
Obvious issues with this include the fact that we would hear Dothraki with a Spanish accent, which is a bigger deal when dealing with actual endangered languages, and the time it would take to "cheat" a pronunciation for every given word (which is harder in the case of polysynthetic languages whose forms change depending on the sentence, and can have hundreds or even thousands of forms) and that an existing TTS for an existing language would have to contain all of the sounds of the target language to "cheat" it into being usable for this purpose.
Other potential solutions would be to record every word in every possible form separately, which would be time consuming and challenging for polysynthetic language as well, or creating a TTS system custom for the Duo course or partnering with another enterprise who can. Not sure how feasible that would be, but that could be an option as well.
A good article on TTS conversion that discusses converting Dutch TTS to the minority related language Frisian can be found here:
Food for thought!!
I wonder if it wouldn't be a better solution to just record every sentence in the course. There must be approximately on the order of 10,000 sentences total in a single Duo course. A few fluent people with adequate equipment could probably do that in a fairly short time.
Luis von Ahn said on Reddit that the ability for volunteers to contribute audio is on the to-do list.
This would make language education on Duolingo much more versatile. Not only would it become easier to create courses for less well documented languages, it would also be possible to showcase dialectical differences within a language.
It also brings many potential problems with it.
Speaking about my own country's accents, there are many of them that it would be dishonest to teach to people when learning English, a certain accent from the south of the country comes to mind that people in Dublin, the capital city, cannot understand.
Moreover, there are two very different accents in Dublin, and a few years ago schools in one region had their licence revoked from hosting summer English Language schools, because the Spanish, French and Italian kids that were coming were paying a lot of money only to end up learning a terrible version of English.
Finally, last year there was a case in the news for a week in which the whole country seemed to get offended because a girl who was working as an English Language au pair in Russia lost her job since her English accent wasn't good enough. To be honest, if the parents are paying a lot of money for the service, it is only fair that they get someone who speaks with Received Pronouncation.
Likewise, I would like to be able to trust that the accent I learn in other languages is the equivalent of RP for whatever language I'm studying. Overall, I agree that it's a good and useful idea, but I think that you need to be very strict about how it is used.
This could very well work for several languages, especially members of the Indo-European language family. However for Cherokee and Inuktitut or other polysynthetic languages could be problematic due to the sheer number of possible forms that occur in these languages due to their dependence on affixes (suffixes and prefixes) which number a single verb's possible forms in the thousands for some languages. Recording each form may be next to impossible, implicating a necessity of a synthesised voice if Duo plans to expand into languages that follow a more synthetic grammar...I may be wrong, but I understood that this was part of the issue Hungarian was encountering?
You are certainly right that polysynthetic language pose many challenges for the Duolingo model, but I don't think this is one of them.
The only audio in a Duo course is the reading of the target language sentences in the lessons. These sentences are fixed and finite in number. You don't need the audio for every sentence that could ever be spoken -- just for those actually used in the course.
Good point Jeff, the sentences are finite, so though it would be time consuming to record, it is certainly possible to do so. That had not occurred to me. :)
As contributor for the Hungarian course, I know there are TTS programs out there for Hungarian, but I also know that NONE of them is good enough. This is not because I know them all, but rather because I know how my language and how computer voices work. In Hungarian, emphasis is a key concept, which is also highlighted in our speech. You can stress words in a sentence and then they have adifferent meaning. How is a computer voice going to do that? You guessed it, nohow. I have tried out several online Hungarian TTS programs and they were all horrible... So what we suggested a long time ago is that we should be able to record the sentences. The Incubator should have a record button or something...
So, out of curiosity, are you guys working with this issue already? (Like active back and forth with the DL team) Or is this going to be something that you'll get to 100% and then we'll have it there for 3 or 4 monthes before the TTS issues get solved?
There are no news about this yet. Unfortunately, I'm afraid we're not going to have a definite answer for at least weeks. But I won't let our course roll out with a TTS that's for sure. UNLESS that TTS is perfect.