1. Forum
  2. >
  3. Topic: Duolingo
  4. >
  5. Improvements to translation r…


Improvements to translation rating system

If the aim of Duolingo is to provide good quality translations through crowd-sourcing, then the translation rating system needs some improvement.

I sometimes revisit old translations to see how “false-friends” are developing, and rate subsequent translations accordingly. It appals me to see some beautifully-crafted English translations, which have grasped the sense perfectly, rated: “Good but not perfect”, whilst other poorly-understood, amateurish, word-for-word translations have been rated: “Very good”.

Improvements I would suggest:

Do not encourage, or even allow, Duolingusts to rate other translations, until they have completed the entire article themselves to a satisfactory standard. There are two reasons for this suggestion. 1) Until you have completed the entire article, it is not apparent that your initial sentences were off track, or that a better translation might be more appropriate than the one you first came up with. 2) By assessing the score for the entire article, Duolingo could then encourage you to rate the translations of others, or not, if you have done badly yourself.

Provide more of an incentive to revisit old translations – better than the occasional (+1), once your own skill rating has improved. Duolingo could post to your stream, reminding you to return when the number of contributors has grown significantly.

Provide a much better incentive (+5) to suggest edits for other translations. Award bonus points (+6) to (+15) according to the original difficulty level, if the suggested edit is accepted by the original translator. You are doing them a big favour by pointing out their mistakes – which hopefully they will learn from - and Duolingo by improving the average quality of the translation. So the award incentive should be at least commensurate with providing a new translated sentence.

Finally, you should be able to see your [own] translation rating. How are you supposed to use that as a model for rating others, if everyone else has spotted something wrong that you haven’t? And maybe Duolingo should not allow you to rate the sentences of others, at least not without providing an edit and explanation, if your own contribution was bad.

July 8, 2012



@supersnuggles - I’m afraid I don’t share your unqualified faith in crowdsourcing models. Firstly, the Millionaire example is not comparable. In that scenario there are 4, and only 4, possible answers to a question. If the question has been designed so that all possible answers are equally probable, an idiot audience will select answers at random. It is only necessary to have a few experts in the audience for their consensus to become apparent in the unequal distribution of votes, thus revealing both the “correct” answer and the confidence we may place in that answer, through simple statistical methods. So whilst the majority may have come up with the answer, the majority of those that are “correct” can still be idiots and the method will still work. In the Duolingo model, the set of possible answers is growing and can exceed the number of people voting, so you cannot easily apply simple statistical methods to select the best answer. To my way of thinking, Duolingo is more like a mathematical iterative method (e.g. finding the roots of an equation). The DuoBot provides the initial guess, and subsequent translations should provide the iterations that will hopefully converge towards the perfect translation (the “limit”, in mathematical terms). My point is that Duolingo’s crowdsourcing algorithm is sub-optimal and more likely to diverge, or at best converge very slowly towards the goal (which can be frustrating for experts who recognize a good answer when they see one). To put it in less scientific terms, if a million monkeys on a million typewriters can come up with the translated works of Shakespeare in a million years, but few of them can read or understand Shakespeare – how will they ever know when they have finished? By voting? As a monkey (a role I will happily adopt for the purpose of this exercise) and in the absence of any other information, I will of course vote for my own translation in the hope of a banana. Better still, I will copy the translation of another monkey who has already been awarded a banana for their translation, in the near certainty that I will receive one too! I would then need at least TWO bananas to risk voting for an untested translation by another monkey or myself. And that is my point. The DuoBot handing out the bananas is not sufficiently encouraging towards behaviors that will tend to converge towards the best solution. (N.B. Offence to any other monkeys who have signed up to Duolingo is neither intended, nor implied by the use of this analogy!)


@mariho and daphne – I’m with Daphne on this. I don’t actually care too much how many skill points I get, which is a good job as I worked out pretty quickly what was going on. Again, mariho’s response reinforces my point beatifully; people will play the system and will not produce the best translations in the quest for more bananas, sorry, skill points and the translations will then take longer to achieve optimum quality, if they ever do ... . I can see good translators abandoning Duolingo comme un chaud pomme de terre when they realise they are just flagellation un mort cheval. (Er, ... What do you mean that is not good French? I looked up every word in my Collins Robert!)


I don't think the software is nearly advanced enough to do the final editing. Human input is definitely needed, but it's clear from what I've seen here that inexperienced translators err on the side of slavishness to the structures of the original language.


I think the underlying issue here is not getting good translations of individual sentences, but that of translating entire documents.

Currently, I think Duolingo is a terrible system for translating an entire document because of the focus on translating one sentence at a time, possibly without the context of the paragraph or document from which the sentence came (unless Duolingo parses the document incorrectly). As such, the current system seems unlikely to converge to a good translation of an individual sentence. Even if we were able to find the optimal translation for each individual sentence, it is likely that the translated document that results from concatenating these sentences would be terrible.

I would suggest that Duolingo create a new task, called something like "stitching" or "tacking" or "basting" (I'm translating from the Spanish hilvanar), in which a human puts translated sentences together to create a cohesive document. They could be given suggestions of the highest rated sentences and upon which an edit has been accepted, but with the freedom to make further edits to the sentences as they put the sentences together to make a translation of the document. This could be a task to which Duolingo invites people who appear to have the ability and interest to create a good translation, perhaps because they translated \alpha % of the points of the article, or they suggested lots of edits when they translated, or they rated many translations, or they have a high level, or because their individual sentences are highly rated. After the "stitching," the entire document could be presented to the community of all those who had translated substantial sections of it for edits to be suggested, and then a final "Duolingo" version of the document could be established, and the document removed from those available for translation. This seems to combine the advantages of crowdsourcing (a set of medium- or high-quality translations of individual sentences) with the perspective of a motivated, skilled human aware of the larger context.

In his original post, 1km makes a lot of suggestions about how to incentivize certain practices by awarding more points. I don't think there's a need to make Duolingo "pointier." People that do those good practices could be rewarded by being invited to do this extra task - which, for those people, is far more interesting than points. Sometimes, I find it frustrating that on Duolingo there's no way for people who really care about making good, context-aware translations to comment on one another's work. Creating this new type of task could do that.


I had been assuming that users would be learning at least 3 skills here: the target language, translating skills, and rating translations. Is the third not happening, not going to happen?

Also, I find that the mechanistic view of translation is even evident in the lessons themselves. Some translations are unidiomatic and a few are wrong--to be expected in the French beta version--but I'm starting to see it in the German, too. And so sometimes I enter the one or the other accepted and safe, but sometimes funky answer. It causes me to do some second-guessing about what the program wants. Real-world context is everything. One huge can of worms is differences in tense usage, e.g. the tendency both in German and French to use a present tense structure in situations where English speakers would use a future. (This observation comes from 27 years of living and working in Germany and France.) Ignore this and you're missing out on tons of real-world equivalencies between the languages and frankly you're being conditioned somewhat to speak Mr. Duobot's dialect. That said, I have a lot of respect and appreciation for the efforts of the Duolingo developers--it's a formidable task, and there are many good ideas here too.


If you are the first person to translate a sentence, you have to "err on the side of slavishness to the structures of the original language" if you want to score the points! As the first translation can only be compared, word by word, with the dictionary, you will find yourself with a very low 'agreement percentage' if you translate idiomatically, even if the idiomatic translation is much better English.


@geometry - Thank you for your excellent contribution to the debate. One can, of course, always "View original ... " to see and read sentences in context, as I am sure you do before starting a translation. I personally dread to think how Duolingo intend to stitch these fragments together, with different standards of punctuation, capitalisation, back translation of non-native language, etc. Given I am neither responsible for, nor going to have much impact on changing the Duolingo translation paradigm, I thought I had better not be too challenging about the approach they are taking and just suggest a few tweaks. I do agree people need "stroking" to work for free, whether in the form of points or otherwise. In fact, I think your suggestions are worthy of an Insight in their own right, so they don't sink along with this one ... .


If mariho's comment is correct, then I'm not sure why I am on this site. I'm interested in learning to translate correctly, yes, but idiomatically too. So that the translation reads appropriately and fluently in (in my case) English. I don't actually care whether I get 3 points or 300.


Further evidence of the rating system being defective. I have just translated (i.e. COPIED) the sentence: "Robert Schumann" from the eponymous translation article included in French Basic. The best translation according to DuoLingo? "Robert Schuman" [sic]! The German composer's name was Schumann - with two "n"s. And this is after 6024 contributions!!! My correct translation barely scraped through, with only 50% agreement. What hope is there, if no one is listening ... ?


The problem is that Duolingo is using a computer to grade the translations. So word-for-word, non-colloquial translations are said to be "correct" and get points while colloquial translations don't get one anything or are said to be "incorrect".

And if 2000 people have translated the same sentence, the 2001 person isn't going to review them all to edit/correct/comment - no matter how bad those 2000 translations are. .

I think the fundamental problem is the assigning of points for the translations based on computer (I assume Google's) translations. This certainly won't achieve the objective of "translating the web" since, of course, we can already use Google to translate any page. The problem is that Google's translations range from good to horrible.

Perhaps it would be better to simply let people translate and let others comment - no points at all. Also, I think the site should make it clear that it is looking for colloquial translations.

As for the editing: I got a comment re an edit for one of my translations in the Spanish section. It was a perfectly acceptable alternative translation - but the only option I had was to "accept" it (i.e., replace my translation). There definitely need to be two options: accept the change or indicate the suggested edit is an acceptable alternate.

I'm also not sure what point there is in giving a beginner, Lesson 1 of 1, anything to translate - unless Duolingo simply expects that novices will limit themselves to simple, single sentences. In which case I'm not sure what they can contribute to the project's goal. Google does pretty well translating, say, Je vous aime.

As somebody who knows a little French (and more Spanish) - solely self-taught - my hope was to improve my vocabulary and grammar, get practice actually writing the language and, yes, improving my ability to translate colloquially. But if I am graded by a computer, well, there are severe limits being placed on what I can actually learn. It's still useful practice and I am learning - but I am revising my goals downward.

Lastly, I know people who make a living translating. It is a highly demanding profession. I have my doubts that this crowd-sourcing method, esp. with the point system, is capable of producing anything even remotely resembling colloquial translations.


I've been thinking about the problem of the abysmal (machine-like) translations and was wondering if, perhaps, Duolingo needs to consider a "translator test".

Sort of thinking out loud here but what if Duolingo created a bunch of sentences of varying complexity, presented them for translation, then, and this is key, had those translations rated by professionals who would then give ratings to the Duolingo translators.

These translators would then get treated differently by the system. For ex., if the translation differed from the computer version, the translator's would be assumed to be the best.

Or they, or some subset of these approved translators, would be monitored by real people to see if their translations were truly among the best or whether the test score was a fluke. So, over time, people could rise or fall in the ratings.

And the ratings of translations by these "approved translators" would carry more weight.

I realize that this requires a greater investment on the part of the Duolingo staff and would mean that some feedback could not be made in real time, but I simply don't believe that the crowd-sourcing, in its current structure, is going to work well.

Essentially, I am saying that Duolingo needs to insert some human experts into the system at some point early in the process in order to get better translations.

Also, perhaps we need a more sophisticated rating system for the translations. A translation can fail or succeed on two grounds: accuracy and grammar.

A translation can be accurate (i.e., the words are translated correctly) but ungrammatical or simply not colloquial. OTOH, a translation may be written well but be inaccurate because one or more words have been mistranslated or the verb tense is completely wrong (past vs. future).

So, we need some way to say "yes, you got the words right but the English is awful", or "Your English is wonderful but you completely missed the point or the subject of the sentence."


These are great comments and suggestions, I concur completely.


I agree, unless you are the first person to translate an item and you are compared to the dictionaries. In that case, I think it is appropriate to rate the "machine" translation.


These are all excellent suggestions. Your second paragraph aligns with my own experience and echoes many other Insights that I have seen posted here.


mmSarre: I find myself doing the same kind of guessing. Will Duolingo accept something colloquial for this one? Or only some word-for-word translation. If I've already lost a heart or two, I tend to go for the latter.

I do agree that the developers have set themselves a difficult task but I still think they are placing way too much faith in computer-translation comparisons and crowd-sourcing.

Tenses: definitely a problem. It seems to be getting better at accepting "I am doing" for "I do" but I still see a lot of situations where in English we would tend to go for a simple future rather than a present tense.

Can't imagine the problems that may occur with the German. (I took a little way back in high school.)


( pardon my bad English ) ... translating is not easy - ... time will tell - who holds the line to DuoLingo - ... edit and edit and edit and edit ... all attempts to translate ... I am very pleased about all this discussion here ... Gruß Dieter


Thanks mm and Remy for your recent contributions. I've just noticed these whilst digging out the banana algorithm for another post. Now that the translation leader board is back, it's apparent that this is just encouraging quantity over quality, judging by the work of some of our "top" translators.

Learn a language in just 5 minutes a day. For free.