1. Forum
  2. >
  3. Topic: Duolingo
  4. >
  5. Current vocabulary sizes of m…

https://www.duolingo.com/profile/CarlosLM.

Current vocabulary sizes of most courses in Duolingo (Update after 8 months)

This is an update of this post:

https://www.duolingo.com/comment/24527279

First of all, I want to give a special thanks to FieryCat, who gave me away all the current data. Secondly, this post is still relevant because what duome.eu calls lexemes, are not real lexemes, they are words, and there might be different words for the same lexeme. Therefore, their "lexeme" tally will be always bigger than the lexeme tally showed here. Anyway, their stats are still very useful either to gauge the size of a course, or to have a control of how much you have advanced in a tree, so take advantage of them.

Top vocabulary sizes (in lexemes), compared with the data 8 months ago:

From English (en)

  • Norwegian(nb): 3265 (=)
  • Hebrew(he): 2676 (=)
  • Dutch(dn): 2630 (=)
  • Romanian(ro): 2342 (=)
  • French(fr): 2296 (1935)
  • Swedish(sv): 2228 (=)
  • German(de): 2137 (=)
  • Welsh(cy): 2133 (2051)
  • Spanish(es): 2128 (1603)
  • Danish(da): 2120 (=)
  • Russian(ru): 2109 (=)
  • Hungarian(hu): 2060 (=)
  • Portuguese(pt): 1950 (=)
  • Greek(el): 1949 (=)
  • Czech(cs): 1911 (=)
  • Esperanto(eo): 1878 (=)
  • Italian(it): 1808 (=)
  • Polish(pl): 1769 (=)
  • Chinese(zs): 1739 (=)
  • Korean(ko): 1708 (=)
  • Irish(ga): 1644 (=)
  • Vietnamese(vi): 1643 (=)
  • Turkish(tr): 1423 (=)
  • Klingon (tlh): 1280 (NEW)
  • Swahili(sw): 1205 (=)
  • Ukrainian(uk): 1127 (=)
  • Japanese(ja): 1090 (=)
  • High Valyrian(hv): 589 (=)

From Spanish (es)

  • English (en): 2180
  • Portuguese (pt): 1950
  • French (fr): 1913
  • Esperanto (eo): 1859
  • Catalan (ca): 1795 (=)
  • Italian (it): 1791
  • German (de): 1710
  • Guarani (Jopará) (gn): 1579 (=)

From Portuguese (pt)

  • French (fr): 1851
  • Italian (it): 1800
  • German (de): 1711
  • Spanish (es): 1588
  • English (en): 1440
  • Esperanto (eo): 1056 (NEW, NOT FINISHED TREE)

From French (fr)

  • German (de): 2138
  • Portuguese (pt): 1957
  • Italian (it): 1808
  • Spanish (es): 1597
  • English (en): 1422

From Russian (ru)

  • French (fr): 1913
  • German (de): 1710
  • Spanish (es): 1571
  • English (en): 1397

From Italian (it)

  • German (de): 2137
  • French (fr): 1913
  • English (en): 1422

From German (de)

  • French (fr): 1848
  • Spanish (es): 1574
  • English (en): 1422

From Turkish (tr)

  • German (de): 1570
  • English (en): 1397
  • Russian (ru): 134 (NOT FINISHED TREE)

From Chinese (zh)

  • English (en): 1448
  • Spanish (es): 756 (NOT FINISHED TREE)

From Hungarian (hu)

  • English (en): 1397

From Czech (cs)

  • English (en): 1397

From Japanese (ja)

  • English (en): 1388

From Polish (pl)

  • English (en): 1397

From Indonesian (id)

  • English (en): 1397

Note: If the total is marked as "not finished tree" the lexeme total is only partial, so please report the correct total if you happen to have completed that tree. To do this use this link(*), changing my user_id to your user_id. For further directions, please go to the previous discussion: Current vocabulary sizes of most courses in Duolingo

(*) https://www.duolingo.com/vocabularies/size?user_id=24953571\&language=all

If you find any mistake, incomplete info or whatever, please tell and I'll update this post. Thanks for reading, and good luck with your languages! ;-))

May 28, 2018

59 Comments


https://www.duolingo.com/profile/svrsheque

there might be different words for the same lexeme.

so you are using what duolingo returns as the number of lexemes learned by users and assuming that that count must be the true number of lexemes taught.

i call unwarranted assumption.

without grammar tags to go by, duo does not have a clue how the "words" being taught in not in-house languages relate to actual lexemes or even dictionary entries. duolingo does return something, and that's about all we can say for not in-house languages.

even just assuming that the duo words learned count is proportional to the actual lexemes learned in not in-house languages is dangerous.

our partially completed tree inventory for czech suggests our tree teaches about 1,200 dictionary entries, far below both the 1,911 figure you show and the official incubator word count at 2,362.

the dictionary:incubator ratios will depend on the forms richness of the language as taught and on how the team used or abused the incubator tools. the ratios involving the user learned words additionally reflect what duo put in that particular sausage, which i suspect nobody will tell us any time soon.


https://www.duolingo.com/profile/CarlosLM.

So then the problem is with the not in-house languages. It's good to know, then in this case it would be an overestimation of lexemes in these type of languages. Thanks for your info!


https://www.duolingo.com/profile/CarlosLM.

I tested it personally for the first skills of French, and the lexeme count was accurate. Obviously I can't test this for all the languages, so the system might have its holes. You could try this with the Czech tree, just patiently note down all the words of the basics, select the lexemes, and compare with the number given by the link (with your user_id): https://www.duolingo.com/vocabularies/size?user_id=24953571\&language=all

In any case, this number will be always less than the total of words shown in the lessons, even with an excess number of lexemes, as you seem to be reporting.


https://www.duolingo.com/profile/svrsheque

i'm sorry--if the idea of this testing is to prove or disprove my assertion that the 1911 count is grossly off, it is a waste of time i cannot afford. i will publish the tally when i am done. at 80% inventoried, the projected total dictionary entry count of the course is 1160.


https://www.duolingo.com/profile/CarlosLM.

Thanks for your info, you obviously have a knowledge about the internal workings of trees that I lack. Next time, if I have the data, I'll add the duome.eu data as well, and I'll try to do some comparison between in-house courses and the rest. OTOH, I'm looking forward to know your tally when you are done. So please, let me know when you have these data available.


https://www.duolingo.com/profile/nueby

In case you still look at this stuff. We finally have the tally for the Czech for English speakers course: 1235 dictionary entries.

This matches nothing in particular. Less than the 2362 official Duo count and the 1911 in your thread, more than our older projection of 1160.


https://www.duolingo.com/profile/svrsheque

just to let you know: not done yet, but will get back to you!


https://www.duolingo.com/profile/CarlosLM.

Well, Esperanto from Portuguese is the biggest EO course so far. Any one has completed it to get the current data?


https://www.duolingo.com/profile/DestinyCall

Poor Japanese. At least we beat High Valyrian.


https://www.duolingo.com/profile/CarlosLM.

Yes, the Japanese tree is a bit short. I hope that someday Duolingo will make a big update.


https://www.duolingo.com/profile/Thomas.Heiss

Hi Carlos, Hello everbody,

There is a new English-Portuguese (from English) tree under A/B test (already rolled-out): https://forum.duolingo.com/comment/27497456


https://www.duolingo.com/profile/CarlosLM.

Hello Thomas, good to see you here! :-)

Does the tree has a bonus? The different lexeme counts are always with bonus included, if there's one. By the way, my LC of Portuguese from Spanish is 1942, so the trees are remarkably similar.


https://www.duolingo.com/profile/Thomas.Heiss

EN-PT course has three bonus skills:

  • Christmas (2 lessons)
  • Flirting (2 lessons)
  • Idioms (3 lessons)

I have completed Christmas + Flirting + 1/3 lessons of Idioms.

This was also the case when I had completed my old tree and I have not changed anything and I did not complete the last two lessons of the Idioms skill.

I am more than 85% sure that I once saw the higher 1950 lexeme count number in my EN-PT tree (even with the last two bonus lessons unfinished)?!?


https://www.duolingo.com/profile/CarlosLM.

The 1950 tally is bound to be correct, because it's based on hundreds of users.


https://www.duolingo.com/profile/Thomas.Heiss

Interestingly, I have now finished my two remaining bonus lessons of the Idioms skill and this is what I have observed:

  • Second lesson added +1 to the lexemes count
  • Third lesson added no new lexemes
  • Finishing three bonus skills indeed adds a "little bit" to the overall lexemes count

Personally I would have expected MANY more unique words (phrase)/lexemes addings especially for the last Idioms skill?!

Sorry, I had not looked at my lexemes number before I finished the first lesson of the Idioms bonus skill.


https://www.duolingo.com/profile/CarlosLM.

Bonus skills are usually too measly regarding new lexemes, you pay some lingots to have an expansion of your tree, and then you get around 20 lexemes. Is it worth it? I don' know.


https://www.duolingo.com/profile/Thomas.Heiss

"Did this tree update add new vocab?": https://www.duolingo.com/comment/27589606

I focused on "skill test-outs" over the last week.
Lexemes count is now indeed growing over the old 1950 mark.
I am still not finished with my new tree....


I had heard that the Incubator requires, that a word is mapped to at least three sentences!?

As we all know, 1-10 lessons on the old Duo system never showed ALL words immediately which was weird, but additional strengthen skill exercises often did, even when single words have been firstly missed for the starting 1-10 lessons.

Why? Because many added sentences probably teaching those words had been suspended because of a higher error rate.
And now these sentences (with words?) are back!

Personally I do expect that we will maybe see more "words/lexemes" over time when we reach higher crown levels L4+L5, as those levels might re-activate older sentences, which have been used to introduce NEW words?!?

Maybe not....
I could be wrong....

Will keep you updated...

Currently I have not a single crown L3 (or L4/L5) skill in my tree after crown conversion (simple conversion rule).
It will take me probably at least 7+ months to level up ALL of my more difficult verb grammar tense skills to the higher crown levels.

Viele Grüße aus Deutschland


https://www.duolingo.com/profile/CarlosLM.

Hi Thomas! :-) The new sentences will correspond to different variations of the same lexemes, so they might be interesting, specially with verb forms. According to Duolingo 50% contents was hidden due to your aforementioned error rate, but I don't know if I'll have the patience to go beyond level 3 in my trees.

And about your link, apparently there's a big difference between in-house courses and the rest, so the lexeme count in the Norwegian or Hebrew courses, might be severely exaggerated. (Look for the posts of svrsheque in this thread.)


https://www.duolingo.com/profile/Thomas.Heiss

I finished the new EN-PT tree (91+3 skills, 463+7 lessons) with a total of 2000 lexemes according to the vocabulary API, like four other users: https://duome.eu/en/pt

DuoLingo Lexemes Ids: 2919+48


https://www.duolingo.com/profile/Thomas.Heiss

@CarlosLM @Fierycat

Hi Carlos, Hi Fierycat

do we already know why the vocabulary API has increased lexemes number from 2000 to 2940 for the English->Portuguese course?

Happened around spring/mid of 2019 or end of 2018.

I also see other users on www.duome.eu/tips/en/pt which have their "vocab" (words/lexemes) column way over 3000.

When I check it for my own account the Duolingo API gets back with that much higher number (I always got 2000 before after I finished my EN->PT tree!).

For newer CEFR trees like French or Spanish from English it also is a huge mess
(comparing the lexemes/words number given by the API = vocab column with the "Duolingo lexemes ids" column).

How can it be that when the "Duolingo lexemes ids" -- including duplicates -- is 4466+49 (max) that some Spanish users actually get higher 4484-4743 vocab (real lexemes) numbers?

The vocab/words (real lexemes) should be lower than the "DL internal lexemes ids" number....as the latter one contains duplicates according to the "?" Duome explanation, shouldn't it?

Have a great weekend!

Greetings from Germany


https://www.duolingo.com/profile/CarlosLM.

@Thomas.Heiss

Hi Thomas, nice to see you again. :-) I think Duolingo is not counting real lexemes anymore, only words. This would explain that according to Duolingo my Portuguese vocab is 2889, and according to Duome 2919(+34).

Anyway, I trust Duo stats to compare similar courses, for example Russian for Spanish, or Russian for English. However, this method fails if I want to compare totally different languages, because now I'm not sure what Duolingo is counting: lexemes, words, something in between?

Have a nice week! Greetings back to you from Spain.


https://www.duolingo.com/profile/CarlosLM.

P.S. From your link, the Vocab column is what the Duolingo API returns, and the Lexeme column is what Duome shows in the stats, but the 2 numbers are so similar, that I don't think it's worth it to make a distinction...


https://www.duolingo.com/profile/CarlosLM.

Congratulations!!


https://www.duolingo.com/profile/Thomas.Heiss

@CarlosLM

Obrigado Carlos,

Eu já vi e li seu comentário no meu outro tópico :-)

Se você não se importa que eu esteja postando o URL aqui para os usuários inscritos e que possam estar interessados em lê-lo ou encontrar outros tópicos do fórum vinculados?


Quizfrage / Romance grammar question

I) Is it either:

  • 1) Se você não se importa que eu esteja postando o URL

  • 2) Se você não se importa em me postar o URL

  • 3) Se você não se importa de postar o URL

for the English translation:
"If you don't mind (me) linking it here" / "If you don't mind that I link it here"

II) Is it either:

  • para usuários que estão inscritos e podem ter interesse

  • para os usuários inscritos e que possam estar interessados

for the English translation "for those users who are subscribed and might be eventually interested"?

Full English text:
"Thanks Carlos

I have already seen and read your comment on my other thread :-)

If you do not mind that I am posting the URL here for those users who are subscribed and might be eventually interested in reading it or finding other linked forum topics?"

III) Where to put "eventualmente" in the above sentence? :-)

Estou feliz por estar em contato com você.

Muitas saudações


https://www.duolingo.com/profile/CarlosLM.

Hello Thomas, you can post all the URLs you consider convenient, no problem!! Cheers! :-)


https://www.duolingo.com/profile/JulesF.

Thank you so much Carlos! I often need this! Have a lingot!


https://www.duolingo.com/profile/CarlosLM.

The data are very powerful, because if you want to do a course from other languages, you might choose one based on size. For example, German from Italian, French or English are much more complete courses than from other languages.


https://www.duolingo.com/profile/FrenchCamille

Thank you so much Carlos ! This post will live in my followed now (:


https://www.duolingo.com/profile/CarlosLM.

You are welcome!! ^_^


https://www.duolingo.com/profile/Taldust

Do you have the "Learning Insights" feature in your Android Duolingo app? (It might be restricted to "plus" users, i.e. paying users.) It reports (among other things) the number of "Swedish Words Learned" (Swedish is the only language I'm currently learning). It tells me I have learned 3233 words. (That number does not increase through practice, only when I do lessons that I haven't done before.) And I am only about 2/3 through the tree. That would be inconsistent with the numbers of Duome and yours. But I have the feeling that feature ("Learning Insights") is still buggy and I don't trust its numbers. But I like that feature anyway. And one day it might even work correctly. ;-)

Your way to get the number of words that I have learned returns "1510", btw.


https://www.duolingo.com/profile/CarlosLM.

Words are not lexemes. Lexemes are the entries in a dictionary, and words all derived terms from a root word.


https://www.duolingo.com/profile/ngraner42

I get 1904 lexemes for 2922 words on the old french tree with a ratio of about 1.5. It is interesting that the ratio for Swedish is over 2. I wonder what it would be for an English tree.


https://www.duolingo.com/profile/Thomas.Heiss

English-Spanish(es) course (from English) corrected 2128 number:

Hard facts

  • Skills: 113+3
  • Lessons: 520+8
  • Duolingo Lexemes (lexemes_ids): 3309+49
  • Lexemes (Vocab API): 2264 (136 more than the previous 2128 number)

Source

76 users with a conquered tree (they reached the max skills, lessons and Duo lexemes_ids): https://duome.eu/en/es

All users list "113+3" skills (three bonus skills).

Q: Are old, e.g suspended, words (old skills) - still added on the user learned words list - from the old EN-ES course "counted in" by the "Vocabulary API" so the number of 2264 is now higher than the 2128 number from first comment?


https://www.duolingo.com/profile/Thomas.Heiss

New English->Spanish (from English) A/B tree: https://forum.duolingo.com/comment/30982094

Hard facts

  • Skills: 159+2
  • Lessons: 678+5
  • Crowns: 797
  • Duolingo Lexemes (lexemes_ids): 4466+32

  • Lexemes (Vocab API): ? (tree needs to be first completed @L1 crown level to be able to check the total number)


https://www.duolingo.com/profile/brian968059

Any update on the total Lexemes for the newest Spanish course yet? I'm expecting about 3000 lexemes but would love some confirmation.


https://www.duolingo.com/profile/dogomolo

Can someone with artistic talent make chart of these? It could be nice to show off to friend


https://www.duolingo.com/profile/CarlosLM.

I could have put the table in the google docs spreadsheet, but that's not very artistic! OTOH, I tried to use the Markdown formatting code as best as I could, but it has a lot of holes. Well, I think in this case it's more important the contents than the aesthetics...


https://www.duolingo.com/profile/jonasrre

Thank you very much for that. :)


https://www.duolingo.com/profile/JimBernhar

Thank you so much for this. Are the specific words being checked against general frequency lists?


https://www.duolingo.com/profile/ngraner42

Duolingo has some words that go pretty deep. For example "pomme - apple" the most freqently seen word in Duolingo French shows up at about 4000 on a subtitles frequency list. I think Duolingo leans toward familiarity rather than focusing on frequency.


https://www.duolingo.com/profile/DestinyCall

Also DuoLingo REALLY likes apples.

A lot.


https://www.duolingo.com/profile/WillowsofXihu

Duo being an owl, we should be thanking our lucky stars he prefers a fruitarian lifestyle, otherwise we might have been wading through endless sentences on eating dead mice and rabbits...


https://www.duolingo.com/profile/LaurianaB

I could not agree more. ))))


https://www.duolingo.com/profile/piguy3

¡guau! English from Spanish now showing 3455 lexemes after update!


https://www.duolingo.com/profile/Thomas.Heiss

"The Reverse Tree just Expanded Too": https://forum.duolingo.com/comment/28148305

Aren't those the "Duolingo lexemes" (lexemes_ids) which are not true lexemes?
If you click the "?" in your duome.eu profile you will get the explanation (you probably know it already):

Hard facts

  • Skills: 151+3 (not +2)
  • Lessons: 571+8 (not +5)
  • Lexemes: 2764 (last updated 09/20/2018)
  • Duolingo Lexemes (lexemes_ids): 3455+60

According to https://duome.eu/es/en 19 users are now listed with a finished tree.

Around 2764 lexemes: New source top three users with finished trees.
More users need to finish their tree with ALL tested out skills (most important: no resetted "lessons_missing" and "num_missing" variables because of tree update/skill conversion!!).

Current top three users

  • NAME | LEXEMES|DUO_IDS|LESSONS|SKILLS
  • jotor99 | 2764|3455+60|571+8|151+3
  • ragnar_ramos | 2764|3455+60|571+8|151+3
  • EricDeMelo1 | 2764|3455+60|571+8|151+3

Old user list:

  • Ruben290872 | 2747|3453+37|569+5|151+2
  • dan1ell | 2749|3427+37|557+5|151+2
  • Oscar738 | 2703|3396+37|543+5|151+2
  • Andrew214219 | 2707|3395+37|541+5|151+2

https://www.duolingo.com/profile/piguy3

Assuming that duome.eu is pulling from the same system (which I presume since the figures by and large overlap, and exceptions can be explained by vanished bonus skills), then the figure for Russian for Turkish is 1483 and Esperanto for Portuguese is 2204.

Incidentally, Russian for Spanish is 1705, and Hindi for English is 643


https://www.duolingo.com/profile/Elizabeth528513

On the Duolingo Words page, www.Duolingo.com/words, I've got 3401 words (German from English). However, I believe that page is corrupted, as it frequently shows words practiced as "just now" when I haven't seen them in months. I know that page is steadily increasing and no longer corresponds to the lexemes on Duome.EU, where my lexemes have been unchanging (2864+48). And at this point, the two "words" totals on Duome don't equal each other, either 3235 of 3252 at the bottom of the list, 2962 on the profile page). So, I've got four different counts at the moment.


https://www.duolingo.com/profile/piguy3

I wouldn't pay attention to the word count total from the bottom of the word list on Duome. For a number of languages, that feature isn't even set-up yet, so it might well not be finished or reliable for any of them.

The Words tab shows different forms of the same word and counts them separately, which it's safe to say the word and lexeme counts do less of (even though they both probably do some).


https://www.duolingo.com/profile/jecobacalzo

I finished the Dutch language tree about 1 month ago. The link says I learned 3309 words, more than the 2630 you posted here.


https://www.duolingo.com/profile/Thomas.Heiss

@Jecobacalzo

Quote: I finished the Dutch language tree about 1 month ago. The link says I learned 3309 words, more than the 2630 you posted here.

The code from the "vocabulary API" has been suddenly changed by staff and increased counting the numbers dramatically.

See my other Portuguese comment about the sudden increased "(real) lexemes" numbers and the quite low 1,3 factor: https://forum.duolingo.com/comment/27444256?comment_id=35448180


https://www.duolingo.com/profile/Ereitz1

What's up with the word lists? I would like to have some sort of overview of the vocabulary I've learned and how much vocabulary in total Duolingo covers. As a learning tool Duolingo doesn't seem to be built well to get insight into your own process statistically or Duolingo's methodology.

I love the game aspect of language learning and I think Duolingo has some super successful aspects but in the long run, just being addictive and teaching random vocabulary isn't enough...

... or in other words, I have the feeling Duolingo has the potential to be so much more...


https://www.duolingo.com/profile/CarlosLM.

@jecobacalzo

I think your Dutch tree hasn't changed at all since the last stats posted here, as you still get the same 2630 lexemes from your duome page. Just go to your duome page, and hover your mouse over the 3309 number here: L 25 W 3309 XP 30299, until you see the tooltip. The relevant number is the 2nd one.

@Thomas.Heiss

Apparently Duolingo doesn't count lexemes anymore, only words everywhere. However, hovering the mouse under the Languages Headline, over the word counts, you can see 2 numbers, and the 2nd number is very approximate or coincide exactly with the old lexeme count. So this would explain why the lexeme counts have increased so much nowadays. :-S


https://www.duolingo.com/profile/jecobacalzo

What about the vocabulary size of Hindi? I don't see it on the list?


https://www.duolingo.com/profile/Thomas.Heiss

@jecobacalzo

The Hindi couse is quite short (32 skills, 133 lessons) and only has a few (~656 lexemes/Duolingo lexemes whatever).

Ranking table for Hindi from English: http://www.duome.eu/en/hi

..(...)..

Please note:

The "vocabulary API" has been changed by staff recently and gives now quite different (real) lexemes numbers, which are recorded in the "vocab" table column or your www.duome.eu/USERNAME "words" (not words but lexemes) at the bottom below your "Languages" headline.

See my other Portuguese comment about the sudden increased (real) lexemes numbers: https://forum.duolingo.com/comment/27444256?comment_id=35448180

The factor (1,3) is IMHO quite (too?) low to be able to compare words in your www.duolingo.com/words list (real words with all duplicates) to (real) lexemes or "base words".


https://www.duolingo.com/profile/jecobacalzo

Oh my, that may not be good enough for me. I need to reach ~A2 level by October because I'm traveling to India. I may need to use other apps.


https://www.duolingo.com/profile/CarlosLM.

@Thomas.Heiss

What do you mean with "factor (1,3)"? Maybe that there are 3 different sentences to practice for each lexeme in Duolingo or something like that? Or maybe you meant 1.3, so the new count would be a number 30% bigger than the old count?

Btw, Happy New Year, Thomas!! :-D


https://www.duolingo.com/profile/frankenstein724

I have something like 1464 for turkish from English. I’ve gone through the entire tree, but not all skills are level 5 (all are level 3 or above). Don’t know if that means I’ve got a few more words coming my way.


https://www.duolingo.com/profile/SuperSapir

I believe that those numbers are not correct in part of the languages . For example in german language there are 2568 words and in dutch there are far more than number that you have stated . Can be that the vocabulary grew over the years .

Learn a language in just 5 minutes a day. For free.