Let's talk about the audio
Hey y'all. As I'm sure some of you have seen, there's recently been an update from Team Irish. They talk about the number of reports, and how they are going down. Congratulations to them for getting the kinks worked out. However, they still haven't addressed the main issue. As you can see (here, here, here, here as well as in other threads asking how to correctly pronounce broad/slender distinctions) the audio is very inaccurate. They have assured us that steps are being taken, but they have refused to firmly admit that the audio is incorrect, often attributing it to "dialect", or to state explicitly what is being done, and often attention is directed towards other parts of the course.
Normally, I wouldn't fuss so much about the audio, but, unlike other courses, the Irish course uses a real speaker. This means people are going to be a lot more inclined to trust the audio than they would a TTS system. Yet, the audio is horribly wrong, and is clearly not from a native speaker, or someone with even a competent level of Irish.
I am by no means devaluing the course, or the hard work the contributors, especially AlexinIreland, who contributed over half the course, have put in. In fact, I already recommend it, though I suggest doing it without audio. I also wish to applaud the moderators and contributors on their hard work on the course, and understand that they didn't have anything to do with the audio (apart from confirming it), but, seeing as they're in closer contact with the Duolingo team than any of us are, I would like a concrete answer about what is being done. As has been repeatedly stated, we will take it upon ourselves to find people who are equipped to give proper pronunciations, and will either find volunteers or crowdfund so as not to cost Duolingo any more money, but I feel something needs to be done, as we're short-changing both learners and the language by leaving it like it is.
Definitely agree with the point about TTS and trust, that's the worst part of it. When I hear the robot Italian lady's voice, I take it as a rough guideline, not as the real, 100% correct pronunciation - like, "Yeah, thanks for the suggestion, Francesca... -cough-WRONG-cough-"
But when you know a real, live person was probably paid to record the Irish voice... then who am I to argue with it? Surely a real human knows what they're doing.
But even I can hear that it's strange and not the way the people on the radio speak. :/
Totally agreed. In fact, if you look at the Irish forums, you'll see that the recorded voice is a major selling point of this course. People are super excited about how "clear and accurate" it is. They put a greater mount of trust in it, because it is a real recording of a person. And clear it is, but if it isn't accurate, then it's misleading people. Not because the Irish course is bad, quite the opposite: because everything else about it is so good.
We did not select the voice artist ourselves, but we have been assured that she is a native speaker. Duolingo have been collating error reports for the audio and a batch of re-recorded sentences have gone live in the past few days. If you come across sentences which sound off, keep hitting the "report" button to flag the issue.
This whole area of recorded audio is new for Duolingo, and the Incubator interface wasn't directly built to accomodate it. Audio is instead directly handled by Duolingo tech staff, as all that has usually been involved was implementing text-to-speech software, with course contributors only rarely getting involved to flag individual sentences for removal where the automated speech synthesis happened to be confusing. The process will necessarily be more complex for individually-recorded audio strings.
Whether she calls herself a native speaker or not isn't really relevant (we've discussed heritage and continuity speakers in the original thread https://www.duolingo.com/comment/4305589). The point is that she's approximating Irish sounds with English ones and it's unacceptable for learners. Having her re-record sentences isn't addressing the problem at all because the problem isn't her mispronouncing something here and there, but rather a consistent lack of understanding of natural Irish phonology. For those interested see here for more discussion -> http://bit.ly/1rCPKOI
I appreciate the distinction between slender/broad consonants; my comment was more intended to update on the point of what changes have taken place in the audio since public release of the course, and to explain why this disconnect exists between course contributors and the audio in the first place.
I also wanted to thank you for responding. It's nice to see that we are now at least communicating. I understand that you don't/didn't have the opportunity to pick which speaker you wanted. I also, however, must agree with Pillowpillar that the current one is most likely a heritage speaker, or else hasn't used the language in a while, which has caused a lot of English to slip in. I'm not asking you to do anything, except perhaps talk to Duolingo staff, since it seems they're very bad at responding on the forums at all.
It's also nice you know you appreciate the distinctions. Hopefully something can be worked out. I actually know of a native speaker, with audio recording experience, who was denied to work on the course. Perhaps we can speak with them, or perhaps any other native speakers we know (my former teacher also has recording experience; she had plenty of things recorded for our class) and see if we can't get them to help. I do think the audio needs to be replaced, but I believe there are other options than paying someone else to do it. If you'd pass these on to the Duolingo staff, we would appreciate it.
Thanks for responding. I'm glad you updated us, but I don't think we'll really feel happy until measures are taken to completely replace the audio. I know this is somewhat out of your hands, but obviously you are one step closer to the Duolingo staff that we are so is there anything you can say to them about this?
I just spent last weekend at a three-day Irish immersion course at the United Irish Cultural Center in San Francisco. There were four native speakers there, representing the different dialects. Not one person pronounces words as does the woman's voice in Duolingo's audio. It came as quite a shock to me, in fact, and I'm signing up for their weekend course, in fact, just to hear how it's really pronounced. One of the more accomplish students, who's been studying the language for more than ten years, including in Ireland,6 said the Duolingo audio was laughable. But then another long-timer said it was good enough. And that's fine with me, as I'm quite fond of this program! I'll hunt down the forum discussion about where she's from. It'd be helpful, I think, if this were made clear from the get-go somehow, tho'.
Here's a quick sketch of an interim solution. (warning: stream of thought software architecting follows.)
Using Greasemonkey or equivalent, it should be possible to create a script which a Duolingo could install in their browser. The script would activate during lessons. Now, each sentence/exercise has a unique identifier (hash). The script could take this ID and send it to a server which matches the ID to a sentence. The server then replies with one or more URLs pointing to recordings of the sentence (using an API like for SoundCloud, or possibly direct links to audio files). The script would then present the audio clips as an alternative to the official one already present in the exercise.
Of course, this only works if there are actual audio clips available for our use. Hence, the script also needs to have functionality for native speakers to contribute audio. The script would add a button to each exercise, saying "contribute audio". Clicking on the button pops up a form dialog where you can input the URL pointing to your audio recording. Submitting the form would add the mapping from the current sentence ID to URL to the database, possibly flagging it for review. Only reviewed audio would be heard by the users.
This still leaves the question of quality control open. Who should review the audio? Also, would there really be any contributors? Would the recordings be any good?
Yes the biggest problem is that no matter how you look at it even the contributors themselves are biased to towards their L1 accent. The only reasonable way is to have a large amount of L1 speakers to evaluate the sound quality and the alternative recordings.
This would really require actual support of a large community of qualified L1 speakers who could register as evaluators once they've been vetted through some quality review process. An alternative is simply enabling it available to some based on IP address, hoping that majority will determine the highest quality, only enabling the recordings when there is at least 80% acceptance, and a number of votes (e.g. 300).
Another issue is determining the legal implications of recording copyrighted material.
First of all you're asking Duolingo staff to believe the judgement of self-proclaimed true Irish anonymous speakers. Scientifically speaking that makes no sense.
If the Irish community is so concerned about improving the audio, then the first step is to find accomplished scholars, institutions, or experts to conduct a study evaluating Duolingo's human audio. Regular people are generally biased (Caban,2003) anyway. So making improvements based solely on the assessment of a few individuals isn't a scientifically valid method of improving anything.
Several of those anonymous speakers are native speakers and they agree it is incorrect. If you're not going to trust native speakers, who are you going to trust? You should know it's going to be impossible to get scientists to do this; they have better things to spend their research funds on. Also, there's quite a few glaring mistakes, like lack of a flipped <r>, and inconsistet use of /x/ and /ɣ/. You know it's sad when even people, as above, who have no experience can tell the difference.
Also, using that logic, shouldn't Duolingo have consulted scientists before choosing their audio? It seems to me they just took a person at their word that they were "native", when they clearly aren't, or, if they are, would be considered a heritage speaker or haven't used Irish in a long time (long enough that English has influenced their Irish).
Well, you mentioned raising funds to hire voice artists. In the same vein you could raise funds to hire scientists to conduct such studies. Users can't even quantify how many reports they've seen that contain mistakes, is it 100%, 24% ?
In any event, some third parties have already conducted studies using Duolingo. So it isn't in any way a "waste" of time.
Also, using that logic, shouldn't Duolingo have consulted scientists before choosing their audio?
Maybe they did it with a limited sample. Assumptions are not useful at this point.
The fact is even if strangers agree something is wrong, it is not necessarily wrong. Even if all those strangers claim to be experts or not. Being a L1 speaker doesn't make you an expert of the language. It just makes it likely that you're proficient in it.
There are many L1 speakers who speak their language incorrectly, pronounce things wrong, and are illiterate. Does that make them good candidates to assess, build and/or improve a language course?
That's the perspective Duolingo has to take when evaluating statements made by an anonymous community, otherwise there wouldn't need to be any application to build a language course here.
>There are many L1 speakers who speak their language incorrectly, pronounce things wrong, and are illiterate.
Y'know that's against the one of the most basic principles of linguistics, right? If a native speaker says something consistently, and there isn't something preventing proper language use, it's correct for their dialect. Sure, it might not be standard, but nobody speaks the standard as their native dialect. As for literacy issue, you must realize that writing is not language, it's a way to represent it and should not be confused for actual language acquisition since it must be taught.
As for money, I doubt we would be able to raise enough to pay a scientist to take out time from other, funded work that needs to be completed. To the scientists who already work on Duolingo, that's what they got their funding for.
"If a native speaker says something consistently, and there isn't something preventing proper language use, it's correct for their dialect. Sure, it might not be standard, but nobody speaks the standard as their native dialect."
Indeed, and "standard" can be a dodgy term too, since there can be competing standards.
Irish is a bit different in this regard from a language such as English or French, of course, because you have heritage speakers and continuity speakers who aren't speaking traditional Irish, but rather a version of Irish that has been heavily influenced by English phonology and so this can be considered as separate from natural traditional Irish.
Addressing your replies below since I can't reply directly.
See here - http://bit.ly/1rCPKOI for a clearer understanding of what she's doing wrong.
I see, you could start by getting a recognised language expert in the community to comment on it. I may even agree with you, but unless an expert says something it remains a theory by random internet users.
Well maybe if we had the list of sentences like we asked.
Constructive criticism is good and all, but as ShadyArc explained in a different thread, the contributors have a right to use all the words/sentences they create. They could give them to you if you so asked politely rather focusing on the negative.
More importantly like I said before in a different thread. You seem more concerned about getting Duolingo to do things your way rather than to create the audio for the benefit of all. Despite its numbers Duolingo is just a small community, there are more than 1 billion language learners in the world.
The audio you want to create should not in my opinion even be hosted in Duolingo, it should be a free CC-BY content that any institution (including Duolingo), university or language learner can use without begging anyone.
P.S. It is not that hard to copy/paste and record all sentences for basics 1 and 2 (or a single skill) for the purposes of showing them what is wrong anyway.
Well then. The person(s) who recorded these audio tracks for Irish claimed they are Irish speakers, and consistently make(s) mistakes (according to you and few others). So going by that logic that is perfectly correct for that person's dialect, and nothing is wrong with keeping it in Duolingo. Continuing with that theory, all they need to do is rename the course to include the Tag of that dialect, e.g. Irish ( Br).
I never mentioned that writing is a language, I mentioned more than one aspect of communicating. Those are skills that one needs to master to be considered relatively proficient in a language.
As for the funding issue, well you never know till you try. There are many bored scientists out there, who may even consider doing it for free.
Irish is just one of many languages Duolingo supports (with few users), making the resources they can dedicate to it limited.
Response to Dessamator: "Well, that is your assessment of the person's proficiency. Maybe that person would also consider you and pillowpillar and whoever it is you hire as a heritage speaker too." I don't think you understand what the problem is, to be honest. I have a few books mapping the dialects of Irish phonologically and none of them have anything like the way the woman pronounces words because she uses English approximations of Irish sounds. This isn't just a "Yeah, well, you know, that's just, like, your opinion, man" situation. See here - http://bit.ly/1rCPKOI for a clearer understanding of what she's doing wrong.
"Anyway, I think you should attempt to communicate through proper channels. Either organize a number of users and ask them all to email support. Or contact the language experts and communicate the problems. " Well the problem is we're hearing nothing from the Duo staff at all and little from the team.
"Perhaps if you record the sentences using a number of proficient speakers and send them to support they are more likely to listen, and believe you're truly serious." Well maybe if we had the list of sentences like we asked. We just want the go ahead to do this. Also, what's the point in recording anything if afterwards they tell us they won't use it? We need some guarantees.
Except as PillowPillar pointed out, it's like she's not a native speaker, but a heritage one. That's an issue in Ireland; plenty of "native" speakers aren't - they're heritage speakers.
> Irish is just one of many languages Duolingo supports (with few users), making the resources they can dedicate to it limited.
And we're not asking them to dedicate any more resources. We're literally offering to do this ourselves, and find people to record the audio. We'll do nothing but send them the completed audio, and let them implement it. Or, tell the course contributors how, and let them do it. We're not asking for Duolingo to do anything, except answer us.
Well, that is your assessment of the person's proficiency. Maybe that person would also consider you and pillowpillar and whoever it is you hire as a heritage speaker too. Therein lies the problem, there needs to be a reliable way of validating that oral proficiency.
Anyway, I think you should attempt to communicate through proper channels. Either organize a number of users and ask them all to email support. Or contact the language experts and communicate the problems.
Perhaps if you record the sentences using a number of proficient speakers and send them to support they are more likely to listen, and believe you're truly serious.