1. Forum
  2. >
  3. Topic: Duolingo
  4. >
  5. You can now read _%. Of what?


You can now read _%. Of what?

Just curious about how this number is being calculated. Is this supposed to be a percentage of the articles that have been uploaded to Duolingo for translation? Or a percentage of the articles on the internet (based on web-crawling?)? Or a theoretical percentage of all articles in the language based on some calculation of word-frequency, etc?

I also notice that after this is displayed it typically suggests I go translate an article, which usually ends up being full of words I don't know. So apparently I can't read 20% of all articles, but rather 20% of any given typical article.

Can someone explain? Thanks.

July 22, 2013



I believe that number is a bit exaggerated. Check this article: http://howlearnspanish.com/2010/08/how-many-words-do-you-need-to-know A quotation from the article: "Learning the first 1000 most frequently used words in the entire language will allow you to understand 76.0% of all non-fiction writing, 79.6% of all fiction writing, and an astounding 87.8% of all oral speech. Learning the top 2000 most frequently used words will get you to 84% for non-fiction, 86.1% for fiction, and 92.7% for oral speech. And learning the top 3000 most frequently used words will get you to 88.2% for non-fiction, 89.6% for fiction, and 94.0% for oral speech." And, Duolingo says a Duolingo user can understand 95% of the articles while the user knows approximately 1500 words. I don't think this is true.


I read that article before I created this discussion, actually; its part of what made me curious about how this calculation was done. Thanks for linking to it, as I think everyone learning Spanish here will benefit from reading it. I too get the impression that Duolingo is probably using a less rigorous, or at least different, measure here than the study cited in the article. I'd like to know what it is.


It would be nice to know what this is, from how it behaves it seems like it is not based on anything. It rises very quickly up to 95% and after that very slowly, as if this was some kind of break off point. Seems like an arbitrary number given just to motivate you to translate.


They probably just use Zipf's law for these calculations.

Short summary: https://simple.wikipedia.org/wiki/Zipf%27s_law

More statistics: https://en.wikipedia.org/wiki/Zipf%27s_law


I'm guessing that when it says you can read 20% of all articles, it means that for an average article (and I have no idea what set of articles is being averaged over), you know 20% of the words in it, i.e. if it is 1000 words long then you know 200 of the words. But I may be wrong.


I thought that may or may not be the case, as I tried to emphasize in my post, but the phrase they use is very ambiguous. It certainly seems as though that is what they mean; I can't imagine that one out of 5 articles (in any sample group besides first-grade story books) uses less than 500 words:-) Still, I would like a little more info on this, so I hope one of the moderators will chime in at some point.


Where do you see that? I don't see that in my version.


It is apparently in testing phase right now and from what I understand only some users have it. But you will probably see it soon.


I can only offer a guess, but it should be based on the vocabulary of uploaded articles. It does, however, seem like it doesn't pay attention to form (e.g. if you know the infinitive of a word, you know them all).


When someone uploads an article Duolingo automatically goes through and analyzes each word and puts all the words that it has into a database. When you complete lessons it tells you the percentage meaning, Duolingo has looked through all the articles in the database and compared the words in each to the words you know. Therefor it comes out with a percentage of an article you can probably read. Either that or it tells you the percentage of all articles that you can understand but I think that it is the former. Also I am just guessing on how Duolingo comes up with the percentage rates because I think that that is how they make it.

Learn a language in just 5 minutes a day. For free.