1. Forum
  2. >
  3. Topic: Duolingo
  4. >
  5. Why aren't we translating Wik…

https://www.duolingo.com/profile/fuonk

Why aren't we translating Wikipedia?

I came to DuoLingo and chose to participate in the study of its effectiveness in large part because of Luis Von Ahn's claim in his TED talk that the main motivation for DuoLingo was to find an affordable way to translate Wikipedia, while giving people who participated in the translation effort something back for their time.

I have only seen two or three Wikipedia articles among the translations, and those were about sports figures-- not a topic I am particularly interested in. Most of the articles to be translated seem to be from periodicals and blogs, and to have little lasting importance. Some of them are terribly written and all of them have proofreading errors due to the way they're scanned in. These don't seem like ideal pieces of writing for language students to be learning from, and the commercial nature of many of them makes one wonder about what DuoLingo's true goals are.

Perhaps the investment of time and energy in creating DuoLingo needs to be paid for by doing some translations for money, although with some of the translations I've seen marked as 100% done, I wonder if they can really be sold. In any case, if there is some sort of commercial transaction going on, I think the users of the site should be told this. If there isn't, I can't imagine why we are translating everything from vicious gossip columns to advertisements for Microsoft products instead of translating the many Wikipedia articles Luis Van Ahn told us needed to be translated. The quality of writing is probably better on the average in them, anyway.

October 15, 2012

21 Comments


https://www.duolingo.com/profile/Luis

Thanks for asking this.

(1) We're not charging anybody for translating articles right now. In the future we may, but when that happens it will be clear which ones are paid for and which ones are not.

(2) We do have Wikipedia articles, but we don't have very many (and it depends on which direction you're working on). We will add more, but:

(3) We want to make sure our final translations are very accurate before we start putting articles back into Wikipedia (or anywhere for that matter). As you have seen, the things that we mark as 100% finished may have some mistakes. We're tweaking the different ways to weigh different parameters (such as the number of votes, or who made the votes, etc.) to reach perfect accuracy. In some language directions, we have pretty much perfect accuracy, but in others we still need to tune the system.


https://www.duolingo.com/profile/kelvinma13

The problem is, there are very few Spanish, French and German Wikipedia articles that need translation, since the English Wikipedia dwarfs all the others. The second biggest, the German Wikipedia is just one third the size of the English one, the Spanish one is less than a quarter, and that's just counting the number of articles, not length of each article. For example:

English http://en.wikipedia.org/wiki/Gas_giant

Spanish http://es.wikipedia.org/wiki/Gigante_gaseoso

This is why the few wikipedia articles here are about Spanish celebrities or TV shows, which are bigger than the English language ones. All others are far smaller than the English versions.

Naturally, on the Spanish or Portuguese Duolingo, there is an abundance of good Wikipedia articles, so I wouldn't say that Duolingo is purposely not offering Wikipedia for translation— There is no Wikipedia to be translated into English.


https://www.duolingo.com/profile/Luis

@bf2012: we're actually working on threading of the answers.


https://www.duolingo.com/profile/DonnaMarie

Some of these articles are 3-4 years old. So outdated that they aren't even useful. So I assumed they are collecting data on the process rather than thinking we are turning out a real product.


https://www.duolingo.com/profile/Luis

We do look at all the suggestions. Would you be ok with an automated reply? Just trying to figure out how to handle the fact that we get 5,000+ suggestions per day, and doing a personalized human response to all of them is impossible.

About the translations: we agree with you and are working on fixing this problem.


https://www.duolingo.com/profile/DonnaMarie

Thanks Luis! Knowing that your staff and time is limited, maybe you could do a few more general posts on the blog so we can know some things you are working on. That way we aren't just guessing among ourselves about the goals and direction of the project.


https://www.duolingo.com/profile/territurtle

However, there are many, many other sources for good articles. I've several times suggested that users be able to submit articles (confirming to DuoLingo's stated criteria) to be considered for translation. Seems like a win-win situation to me.


https://www.duolingo.com/profile/fuonk

Meanwhile, you have many people working on articles that they have little or no interest in, and also on articles which are in some cases rather poorly written. (I know it is presumptuous to say this about articles written in a language which I don't know well, but in some cases it is quite obvious.) Are you willing to consider territurtle's suggestion, or some of my suggestions for possible sources of articles? Apart from increasing the pleasantness of the task (which increases learning considerably), you will also find that you will get much better translations from people who have some interest in what they're translating, if for no other reason than that they have a motivation to learn the specialized vocabulary each topic requires. (There are other reasons as well.) Also, it is a bad idea encouraging someone to translate one sentence from the middle or end of an article without looking at the rest of the article, which is what the incorporation of the translations into the lessons is doing. Good translations require a certain amount of context; people doing translations should be encouraged to figure out as much as possible about the context, and should be rewarded for doing so. The problem with any system of points or grades is that it tends to encourage students to maximize their grade-earning, often at the expense of doing a good job or learning as much as they could in the process. Please accept this as a fact based on many years of college teaching experience.


https://www.duolingo.com/profile/Luis

We're definitely going to allow people to upload documents soon. The problem is that we have a small engineering team, and we're working on 50 different improvements to the site at once so we don't move as fast as we'd like.

As far as you saying we don't give personal feedback. I'm always puzzled when people say this. Compared to most websites, we are actually quite involved in the questions, on reddit, etc. Perhaps we need to make a bigger deal saying that we work for Duolingo. All of this said, there are ~15 of us, and about 500,000 of yous, so we won't be able to respond to everything :)


https://www.duolingo.com/profile/fuonk

I have submitted numerous suggestions for improving the site, and more alternative translations for sentences in exercises which the site does not have on its lists of "right answers" than I want to think about. I have never once had any feedback from any of this. The only person involved at all with DuoLingo that I have heard from is Prof. Vesselinov, who is of course not involved directly. (I don't know why this box is not paying attention to line feed characters.) By the way, I want to tell you about the worst feature of the grading system, while I have your attention. The translation process is of course seeded with machine translations, so the first translations submitted by humans are compared to those machine translations. If you think about it for a moment, you will realize (and I'm sure you already know this) that if a professional translator submitted the first translation, the site would most likely report that it did not agree enough with already known translations for it to be acknowledged as "correct". The professional translator would of course simply laugh at this, but the student learning the language does not know enough to do so. All he or she knows is that he or she hasn't earned points-- this will motivate him or her to translate in such a way as to be more likely to earn points in the future-- for example, by using cognate words even if they are not especially apt, and mimicking the sentence structure closely. The student loses confidence in what he or she has been doing unnecessarily, and the site has trained the student to do worse translations. The only thing that saves the process as it stands is that enough translations are proposed which are nearly as bad as the machine translations that this gradually paves the way for better translations to be acknowledged. This may be fine as a model for producing better translations, but it has some very bad features as a teaching device.


https://www.duolingo.com/profile/fuonk

Thanks again for the time you put into responding to these questions. :-)


https://www.duolingo.com/profile/fuonk

Here's a very simple suggestion which would mitigate the bad training effect somewhat: After more translations have been submitted for a sentence (perhaps when the translation process reaches a certain "% done" figure) the site should go back and compare translations which were graded as "can't evaluate" to the expanded pool of translations. When they "agree" to a certain extent with this wider pool (perhaps 50%, as with the first evaluations, but a higher percentage requirement would also be OK), the person who submitted the translation should be sent an automated acknowledgement of this, and awarded at least the difference in points between a "can't evaluate" submission and a "agrees" submission. This is of course not as good as more immediate feedback, but it is a whole lot better than simply being penalized for submitting a good translation too early in the process, and that being the end of it.


https://www.duolingo.com/profile/fuonk

In addition, when a sentence is evaluated as 100% complete, all translations which agree with the final translation by at least X%, where X is reasonably high, but its exact value would have to be determined experimentally, should be awarded a few extra points.

Both of these suggestions would be very easy to implement, I think, with the tools that you already have on hand.


https://www.duolingo.com/profile/fuonk

The important part here is not the extra points, but the learner receiving explicit acknowledgement from the site that his or her translations were better than the site "realized" when they were submitted.


https://www.duolingo.com/profile/bf2010
  • 2317

@fuonk, you wrote yesterday (16thOctober2012) "I came to DuoLingo and chose to participate in the study..."; sorry I start my answer in that way, but I am always tryng to find a starting point for the discussion and I find the ordering system of our answers highly confusing and most of the time I am only guessing who gives which answer to whom and to what question. Does anyone else have the same problem or is it only me? 1. Translation: I agree with most things said about the problems of the selection and translations in Duolingo; finding meaningful articles is an issue in Duolingo (not to speaking of the web in general, but pleasing all the people is even harder... While the translation issue has been "raging" throughout Duolingo (including its blog and reddit) I find it next to impossible to find out what people are saying AND making meaningful contributions AND getting feedback from Duolingo (and Luis who seems to have to work the hardest to give feedback). Instead I spend my time writing in blogs and such and not in learning the language I want to learn. 2. Therefore I am suggesting (not for the first time) for Duolingo to implement a STRUCTURED feedback mechanism, including a search function with keywords etc (a kind of reddit (with admin supervision) OR if that uses too much of Duolingos resources to use a sub-stream of reddit for the whole question-section in Duolingo ie. to outsource the whole question-section and avoid this ever growing redundancy of questions and answers and the valuable time and ideas of Duolingo´s participants. Thanks for taking the time to read this post...:-)


https://www.duolingo.com/profile/bf2010
  • 2317

@luis: thanks for your quick answer; threading would really be great; and maybe this could also lead to one source of information about Duolingo (AMAs etc) ...? Great job, have a nice day


https://www.duolingo.com/profile/Luis

@kcin: that's how we do things here :)


https://www.duolingo.com/profile/fuonk

By the way, thanks very much for answering this question personally. One of the things I have liked least about DuoLingo has been the total lack of feedback from anyone involved in the project, so it is very reassuring to hear from the person who came up with the idea for the project.


https://www.duolingo.com/profile/kcin

@Luis "we're actually working on threading of the answers"

It's really strange you are working on developing this forum, instead of using some turnkey solution (e.g Google Groups or VBulletin if you want to run your own) for discussion and spending that time on improving Duolingo instead.


https://www.duolingo.com/profile/fuonk

Surely there are many reviews of Spanish language films and books written in Spanish which have not been translated into English; not to mention articles on contemporary Latin American political and economic issues, for just a couple of ideas. Translating these might have some lasting importance. If the articles which are turning up on the DuoLingo translation list from Spanish to English are the best the DuoLingo people can come up with, and there are no commercial reasons for choosing them, then I have to say they aren't a very imaginative bunch of people. I think territurtle's suggestion is very sensible. It might be very educational in a number of ways, for example, to translate articles about how people in various Spanish-speaking countries view US immigration issues or the US involvement in the "war on drugs".

Learn a language in just 5 minutes a day. For free.