# "You can now read X% of all real articles" - What does this mean statistically?

Does "You can now read 40% of all real Spanish articles" mean:

(1) "Given 100 articles written in Spanish, I can read 40 of them at 100% comprehension"

or does it mean

(2) "Given 100 articles written in Spanish, I can read 100 of them at 40% comprehension"

?

9/6/2013, 8:01:44 PM

It's intended to mean that, given 100 real Spanish articles, there are 40 that you can read and understand with a little bit of help (such as looking up certain words in the dictionary, or using the hints displayed on Duolingo when you hover over a word).

9/7/2013, 4:04:38 PM

Doesn't that depend on which 100 articles you start with? How do you decide when a person can read 40%?

9/7/2013, 4:28:12 PM

Of course! I meant that if you drew 100 articles uniformly at random from our database, you'd be able to read 40 on average.

Loosely speaking, we estimate whether you can read a given article based on the fraction of words in the article that you've seen in lessons. But it's not quite as simple as that, because not all words are treated equally. For example if the article is about Google, the fact that you haven't seen "Google" in lessons doesn't make the article any harder to understand.

9/7/2013, 10:38:12 PM

So the assumption is (1) that your data base is representative of published articles in general (fiction?, philosophy?, biology?, news?) and (2) that you "can read" a certain article if you recognize a certain fraction of the words (what fraction? how tested?). Then you announce rather optimistic results (like 96.1%) with a surprising level of precision.

9/7/2013, 11:31:05 PM

There are more possibilities: (3) You can read 40 % of the words that appear in written Spanish. (4) You can read 40 % of the word occurrences that appear in Spanish. (So every the on a page counts separately, for example.) If I had to guess what is meant, I would pick (4).

9/7/2013, 3:07:26 AM

I have been assuming it is 4 and definitely not 3, since it would imply there is a very small amount of words. If it was 3, I would reach "100%" of French with 1500 words (I have 40% and 600).

9/7/2013, 5:08:54 AM

you can read more about it here: http://www.lingholic.com/how-many-words-do-i-need-to-know/. it's not a scientific article, but it is a very clear explanation of the Pareto principle. you really don't have too know ten thousands of words to read a language:-) however there is a difference if you learn a language from a completely different culture. the languages on duolingo at this point are roman and german languages, so that would not be a problem.

9/7/2013, 8:47:45 AM

I think it refers to 40% of all words in articles are also in your vocabulary page. However, It says I can read 95.8%, which I don't believe.

9/7/2013, 9:49:53 AM

Duolingo has never shown me this screen.... is it some kind of A/B testing beta roll-out? Or- do I have to do something special to unlock this?

9/8/2013, 6:55:33 AM

You don't see the screen until there are unfinished translations that we think you're ready to work on. You should see it once you get further down the tree (exactly how far depends on what translations are unfinished at the time).

9/8/2013, 3:32:49 PM

Cool, thanks.

9/8/2013, 3:39:32 PM

I think you also only see it if you are using the website rather than one of the apps

9/8/2013, 9:54:51 PM

I think there's a third option. You can read 40% of all real articles when you dump all of the real articles into one pile.

So, it's probably closer to your # 2 option.

9/6/2013, 9:53:03 PM

Commonly asked question, Duo gives no answers, apparently. One would think it would be a simple question for Duo to answer, if there actually is a basis for what they say.

9/6/2013, 11:15:16 PM

you can test here how much words you probably know in English:-) http://testyourvocab.com/. How much articles you can read depends not only on the words you know, but other factors are involved too. Is it a language similar to yours? how many languages do you know?

9/7/2013, 12:03:49 PM

I would say you understand 40 percent of the lessons including what you can figure out from the context. My percentage is close to 100. This may be so for the Dl lessons so far but certainly not for the Spanish text in the real world Spanish section.

9/7/2013, 12:49:21 PM

Being able to read an article depends on word recognition, but also on knowing grammar and sentence structure and subject matter knowledge. I wouldn't be able to read 96% of all articles in English.

9/8/2013, 4:57:14 PM

I'm a bit confused on this. I jumped from being in the upper 60's (as far as what percent of articles Duo thinks I can read) to the mid 80's, and now suddenly at over 90%. All within a week. I am nowhere near done with the Spanish sequence.... Doesn't this seem... wrong?

9/17/2013, 6:10:58 AM

Yes, I think the evaluator for this has got broken recently. it happened once before I remember. Will probably get fixed soon.

9/17/2013, 6:36:50 AM