1. Forum
  2. >
  3. Topic: Duolingo
  4. >
  5. Current vocabulary sizes of m…

https://www.duolingo.com/profile/CarlosLM.

Current vocabulary sizes of most courses in Duolingo

I was curious to know how many words each tree has, as a means to have an idea how long a course is. So I wrote this post:

https://www.duolingo.com/comment/24435667

There I learnt, that you should make a distinction between unique words (what you get from the tab Words) and base words or lexemes, which are the different entries in a dictionary. This second count is the most important, because it is independent from the different declensions or conjugations a language might have. Fortunately, Duolingo has this number available for you, and you can get it with this trick I got thanks to FieryCat

https://www.duolingo.com/vocabularies/size?user_id=24953571 (using my current id in this example)

However this trick was limited to use it. First of all, you needed to get an user_id (you can get one using: www.duolingo.com/users/"your Duolingo user name" and looking for the last "id" or "avatar" words), but you only got a parcial count if your tree was not complete. You could change your language to get the other counts, but you couldn't change the base language of one another user (that's good, because otherwise it would be hacking)

So I worked around this problem, trying different parameters and I got very lucky to find out that adding =language_code you could access all the accessible different counts without changing anything else. Yoohoo!!! :D

So now, if I am studying French from Spanish, you can get my Portuguese from Spanish count this way:

https://www.duolingo.com/vocabularies/size?user_id=24953571\&language=pt (please, copy and paste the whole link, this is the closest working link I can get with an ampersand :-/)

However, to get my English course counts I think I should change my base language to English. And all the same goes for other user_id, so you could tell more or less if an user with millions of XPs is a genuine language lover, or is simply a gamer (Tip: look for difficult languages, almost every heavy user completes the Spanish or French trees, but difficult or obscure languages like Hungarian, Hebrew, Korean, Swahili, Vietnamese or Guarani are another story)

Without further ado, these are the top vocabulary sizes I have found:

  • Norwegian(nb): 3265
  • Hebrew(he): 2676
  • Dutch(dn): 2630
  • Romanian(ro): 2342
  • Swedish(sv): 2228
  • English(en): 1388 (ja), 1397(dn, ru, uk, pl, tr, cs, ro), 1422(fr, de, it), 1440(pt), 2180(es)
  • German(de): 2137
  • Danish(da): 2120
  • Russian(ru): 2109
  • Hungarian(hu): 2060
  • Welsh(cy): 2051
  • Portuguese(pt): 1950
  • Greek(el): 1949
  • French(fr): 1935
  • Czech(cs): 1911
  • Esperanto(eo): 1878
  • Italian(it): 1808
  • Catalan(ca): 1795(es)
  • Polish(pl): 1769
  • Chinese(zs): 1739
  • Korean(ko): 1708
  • Irish(ga): 1644
  • Vietnamese(vi): 1643
  • Spanish(es): 1603
  • Guarani(gn): 1579(es)
  • Turkish(tr): 1393, 1423*
  • Swahili(sw): 1205
  • Ukrainian(uk): 1127
  • Japanese(ja): 1090
  • High Valyrian(hv): 589

(*) I've only found 1 user with so high vocabulary size.

Please, take into account that some trees have A/B tests, so maybe you cannot get the full count at the moment. OTOH, maybe you have to buy the bonus to get the full vocabulary of the course. If somebody knows it, please tell.

If I committed a mistake or typo, or another language or course is missing, please also tell to add or correct the info.

Cheers and happy learning!!


P.S. Update Great news, now you can get all the languages you are doing from your current base language, with the corresponding vocabulary sizes and even a total tally, using this link (copy and paste the whole link, and then change my user_id with yours):

https://www.duolingo.com/vocabularies/size?user_id=24953571\&language=all

Thanks again to FieryCat for this new trick.

Update2 This a mini "Hall of Fame" based on total vocabularies of people I follow (+10,000 lexemes):

Hall of Fame of Duolingo

Update3 English course vocabulary sizes. For more information about courses, see the latest post of FieryCat

September 20, 2017

87 Comments


https://www.duolingo.com/profile/FrankKool

Thanks for sharing.

Also, isn't it odd that the most widely used course (Spanish, now with counting over a 100 million registered users) has such a small vocabulary?


https://www.duolingo.com/profile/CarlosLM.

Yes, the Spanish vocabulary was set in stone long time ago, and I don't see any interest from Duolingo in improving the course. The only difference between users is that some have done bonus lessons and some not, and this only accounts for about 15 lexemes.


https://www.duolingo.com/profile/Ontalor

There have actually been two dramatic updates to the course since Spanish was originally released years ago. The updates come every so often.


https://www.duolingo.com/profile/CarlosLM.

Thanks for your interesting info! Keep in mind that I cannot know very old vocabulary changes, because I've only tracked very active users (plus a few users that have finished a tree recently) and they tend to keep updated the popular languages.


https://www.duolingo.com/profile/FieryCat

I wrote and ran a small script to analize maximum vocabulary size (base language English) for the users from this list (994 users): Streak Hall of Fame Sign Ups. As soon as the data appears, I will publish them in order to compare.


https://www.duolingo.com/profile/CarlosLM.

Wow, this is amazing!!! Could you select users with other base languages? I'm specially interested in getting the vocabulary sizes of the different courses on Duolingo to teach English, but it's sooo hard to get these data!!


https://www.duolingo.com/profile/FieryCat

I improved my script a bit and got this:

From English (en)

  • Norwegian (Bokmål) (no-BO): 3265, finished: 100.0%
  • Hebrew (he): 2676, finished: 100.0%
  • Dutch (nl-NL): 2630, finished: 100.0%
  • Romanian (ro): 2342, finished: 100.0%
  • Swedish (sv): 2228, finished: 100.0%
  • German (de): 2137, finished: 100.0%
  • Danish (da): 2120, finished: 100.0%
  • Russian (ru): 2109, finished: 100.0%
  • Hungarian (hu): 2060, finished: 100.0%
  • Welsh (cy): 2051, finished: 100.0%
  • Portuguese (pt): 1950, finished: 100.0%
  • Greek (el): 1949, finished: 100.0%
  • French (fr): 1935, finished: 100.0%
  • Czech (cs): 1911, finished: 100.0%
  • Esperanto (eo): 1878, finished: 100.0%
  • Italian (it): 1808, finished: 100.0%
  • Polish (pl): 1769, finished: 100.0%
  • Irish (ga): 1644, finished: 100.0%
  • Vietnamese (vi): 1643, finished: 100.0%
  • Spanish (es): 1603, finished: 100.0%
  • Korean (ko): 1532, finished: unknown
  • Turkish (tr): 1423, finished: unknown
  • Swahili (sw): 1205, finished: unknown
  • Ukrainian (uk): 1127, finished: unknown
  • Japanese (ja): 1090, finished: 100.0%
  • High Valyrian (hv): 589, finished: 100.0%

From Spanish (es)

  • English (en): 2180, finished: 100.0%
  • Portuguese (pt): 1942, finished: 100.0%
  • Esperanto (eo): 1859, finished: unknown
  • French (fr): 1848, finished: 100.0%
  • Catalan (ca): 1795, finished: 100.0%
  • Italian (it): 1791, finished: 100.0%
  • German (de): 1710, finished: 100.0%
  • Guarani (Jopará) (gn): 526, finished: unknown

From French (fr)

  • German (de): 2138, finished: 100.0%
  • Portuguese (pt): 1957, finished: 100.0%
  • Italian (it): 1791, finished: 100.0%
  • Spanish (es): 1571, finished: 100.0%
  • English (en): 1422, finished: 100.0%

From Portuguese (pt)

  • French (fr): 1851, finished: 100.0%
  • Italian (it): 1800, finished: 100.0%
  • German (de): 1711, finished: 100.0%
  • Spanish (es): 1571, finished: 100.0%
  • English (en): 1440, finished: 100.0%

From Russian (ru)

  • French (fr): 1913, finished: 100%
  • German (de): 1710, finished: 100.0%
  • Spanish (es): 1571, finished: 100.0%
  • English (en): 1397, finished: 100.0%

From German (de)

  • French (fr): 1848, finished: 100.0%
  • Spanish (es): 1574, finished: 100.0%
  • English (en): 1422, finished: 100.0%

From Italian (it)

  • French (fr): 1913, finished: 100.0%
  • German (de): 1841, finished: 100.0%
  • English (en): 1422, finished: 100.0%

From Turkish (tr)

  • German (de): 1570, finished: 100.0%
  • English (en): 1397, finished: 100.0%

From Indonesian (id)

  • English (en): 1242, finished: 83.6%

From Greek (el)

  • English (en): 684, finished: 45.5%

From Japanese (ja)

  • English (en): 1388, finished: 100.0%

From Czech (cs)

  • English (en): 1397, finished: 100.0%

From Romanian (ro)

  • English (en): 1397, finished: 100.0%

From Polish (pl)

  • English (en): 986, finished: 65.5%

From Dutch (nl-NL)

  • English (en): 1397, finished: 100.0%

I hope this information will be useful to you.


https://www.duolingo.com/profile/Olja.
  • 1866

Incredible. Thank you for the information.


https://www.duolingo.com/profile/CarlosLM.

Wow, astonishing data!! Thanks a lot! :D :D However, you added another mysterious number, the finished part. How do you know this number??


https://www.duolingo.com/profile/FieryCat

There is another trick, which allows to get a lot of useful data about a user: https://www.duolingo.com/users/user_name_here. This data contains information about the the course that the user is learning now.


https://www.duolingo.com/profile/CarlosLM.

Aha, I Knew that, that link has very cool info. One hidden information I very much like to see are the individual streaks, you have one for every language... That way you can see what languages one user is currently working on, or you can know how many streak days you have with a certain tree or trees.

Another trick maybe you didn't know. Look at the link of the image of your avatar:

https://duolingo-images.s3.amazonaws.com/avatars/70993956/u-YDValmNM/xlarge

The long number is your user_id, and by hand it's a lot faster to get it from there using the Right Mouse Button (Over the avatar, in Chrome click "Open image in new tab"), than from the usual link https://www.duolingo.com/users/user_name_here.


https://www.duolingo.com/profile/CarlosLM.

OTOH I think you should publish your data in a post yourself (if you haven't done yet), as they are much more complete that mine. With my manual method I can get my info quite easily from my best users, but only for the English courses. Again, thank you very much for your info. Now maybe I select one course instead of other based on tree size, and while I'm advancing in my current trees, I'm looking forward to confirming the full vocabulary sizes.


https://www.duolingo.com/profile/FieryCat
  • Norwegian Bokmål(no-BO): 3265
  • Hebrew(he): 2676
  • Dutch(nl-NL): 2630
  • Romanian(ro): 2342
  • Swedish(sv): 2228
  • German(de): 2137
  • Danish(da): 2120
  • Russian(ru): 2109
  • Hungarian(hu): 2060
  • Welsh(cy): 2051
  • Portuguese(pt): 1950
  • Greek(el): 1949
  • French(fr): 1935
  • Esperanto(eo): 1878
  • Italian(it): 1808
  • Polish(pl): 1769
  • Irish(ga): 1644
  • Vietnamese(vi): 1643
  • Spanish(es): 1603
  • Czech(cs): 1540
  • Turkish(tr): 1393
  • Ukrainian(uk): 1107
  • Japanese(ja): 1090
  • Swahili(sw): 590
  • Korean(ko): 582
  • High Valyrian(hv): 286

The list size: 26
Number of users: 793

The script source code: https://pastebin.com/1RD5NN3g


https://www.duolingo.com/profile/CarlosLM.

Great work FieryCat, your script is confirming my data that I got by hand. ;-)

For Korean I had 4 users confirming the high result Korean(ko): 1708****
for Swahili I had 6 users Swahili(sw): 1205******
and for High Valyrian only 2 users: 589**

For these difficult languages, sometimes what I did was to look for messages in the forum of the type: "XXX tree finished" And the rarest language of all Duolingo for me was Guarani, because I only could find one user with a high result: piguy3 with his 1579 of vocabulary size:

https://www.duolingo.com/vocabularies/size?user_id=183767404\&language=gn


https://www.duolingo.com/profile/FieryCat

Korean(ko): 582

Most likely, "my" user has not finished his tree. I did not check that in my script. It is necessary to add such a check in the script or display the percentage of completion of the tree.

In fact, this information should be collected in all forums and not only in one thread.


https://www.duolingo.com/profile/CarlosLM.

Yeah, this is a very difficult course to have finished because it was released very recently. The kind of users you checked are juggling 20+ trees, so unless they have previous Knowledge of Korean, it's really difficult to complete that tree in no time.


https://www.duolingo.com/profile/piguy3

I found another user with a 1579 count for Guaraní. For some reason the method didn't work for a few other people whom I know to have finished the tree, however (ah, b/c I assume their accounts weren't set to Spanish base language at the relevant moment; mine wasn't either just now, but I guess it can be easier to check oneself).


https://www.duolingo.com/profile/CarlosLM.

This is strange, I have gotten Catalan results from users that have their accounts set to English, but maybe for Guarani is different, and if the account is not set to Spanish, you cannot get the data. Anyway, thanks for your commitment to the Guarani language, I got the coveted Guarani vocabulary size! :D


https://www.duolingo.com/profile/CarlosLM.

I have just given you 20 lingots for your hard work and very appreciated help, and because I don't like your downvotes at all. :-/


https://www.duolingo.com/profile/FieryCat

Thanks. Could you tell me what kind of data you are interested in? In a few days, if I will have a spare time, I would updated my script and shared the data that you need.


https://www.duolingo.com/profile/CarlosLM.

The vocabulary sizes of the most important English courses (e.g. en\<-de, en\<-ru, en\<-fr, en\<-pt, en\<-Arabic, en\<-Chinese, en\<-Japanese, en\<-Indonesian...) as my data is really scarce. If this is too much trouble, don't worry, maybe you could run your current script again 3 months later to see the vocabulary changes!! ;-)


https://www.duolingo.com/profile/FieryCat

The script is already working but I can edit it later. The script will tell me how many users from that list were from English language. I think that there will be the majority of them.


https://www.duolingo.com/profile/CarlosLM.

I don't understand why 2 people have downvoted you! Maybe it's the word "script" what has triggered an irrational response IMO. If we didn't have scripts, programs, or computer science in general, we wouldn't have Duolingo in the first place, this wonderful tool to help you learn foreign languages.


https://www.duolingo.com/profile/FieryCat

No, It was my personal "fan". It seems he has finally discovered for himself the English branch of the forum :D Never mind.


https://www.duolingo.com/profile/CarlosLM.

Cool! Looking forward to your results... ;)


https://www.duolingo.com/profile/CarlosLM.

Update2 This a mini Hall of Fame based on total vocabularies of people I follow (+10,000 lexemes):

Hall of Fame of Duolingo


https://www.duolingo.com/profile/FieryCat

Do you know that the parameter "language" can have value "all"?


https://www.duolingo.com/profile/CarlosLM.

Wow, I had no idea!! Now you can do your personal total tally way easier than before!


https://www.duolingo.com/profile/CarlosLM.

I appended an update to the post, including your info.

Btw, duonks is probably the number one polyglot in Duolingo, he has topped almost all the English courses and has a total tally of 46343. What an impressive feat!! :O


https://www.duolingo.com/profile/FieryCat

Take a look at this user: "Olja.", she has total 444701 XP.


https://www.duolingo.com/profile/CarlosLM.

Olja. is a really great duolinger, but he or she doesn't enter my top 3, based in total lexeme count (from English):

  • 1: duonks with 46343
  • 2: NohTaebin with 39631
  • 3: garpike with 38232

whereas Olja. has an outstanding total of 34376.

Note that my ranking is not exhaustive, it's only from users I follow.
Another observation is that after doing my vocabulary research, XPs has become rather meaningless for me. As far as I know, best duolingers has around half million points. Those who are in the millions, they usually go for very few languages, and I don't know if they are truely learning them. For me it's absurd to constantly review the same 2,000-2,500 lexemes... If you want to be fluent, maybe you need 10,000+ lexemes, and a further immersion in the language that Duolingo doesn't provide currently.


https://www.duolingo.com/profile/piguy3

Those high single-language totals are generally from Immersion; I think that's usually the case for any number substantially above around 40,000.


https://www.duolingo.com/profile/CarlosLM.

Yes, you are right, but if an user for example has Spanish, German, French and English, all of them with +500,000XP and no exotic language, I find them boring. And if they have a difficult language, most of them don't have finished the tree!! Whereas, users with less than 1,000,000XP I found that they usually study their languages. I don't say anything about keeping the trees gold, I'm only talking about finishing trees, because I cannot know if they keep them golden, or if they complement Duolingo with other resources.


https://www.duolingo.com/profile/piguy3

How do you use the trick when you have more than one course into a target language? For instance, I have two for French but if my account is set to a third base language, I get a "0" for "fr".

EDIT: I assume you mean courses from English ;) OK


https://www.duolingo.com/profile/CarlosLM.

Aha, if you have for instance these 3 hypothetical courses "fr from en", "fr from es", "fr from it", you should change your base language to English, then to Spanish, and then to Italian, to get the respective tallies. Or easier, I think if you change manually to those courses, you should get the tallies. Play around, and you should get all your tallies from all your base languages. Good luck, and feel free to ask for help if you get into a dead-end.


https://www.duolingo.com/profile/rabbitrah

Yeeeeeees. Bro I love spreadsheets, you're an angel.


https://www.duolingo.com/profile/CarlosLM.

you are welcome


https://www.duolingo.com/profile/OliverBens7

I think it'd be useful to add the size of chinese course to your list as well. According to https://www.duolingo.com/vocabularies/size?user_id=24316963&language=zs it's 1739. I found this using RobinCard and frawaradR as users because they both seem to have completed the course according to these: https://www.duolingo.com/comment/25253210 https://www.reddit.com/.../thoughts_no_completing.../. I hope that helps :)


https://www.duolingo.com/profile/CarlosLM.

Thanks, I've added the Chinese count to the list.


https://www.duolingo.com/profile/WulfgarGoodread

If you have a way to check, I currently have my Golden owls in the English for Portuguese Speakers, English for Dutch Speakers, and English for Italian Speakers.

I used to have English for Spanish Speakers but they have extended that tree and I haven't got around to finishing the new lessons. I have a few trees like that; I have finished 20 different trees (some more than once) but some have been expanded and I haven't finished the new material, like Dutch and Norwegian for English Speakers. I also have golden owls for [Portuguese, Catalan and French] for Spanish Speakers and Spanish for Portuguese Speakers.


https://www.duolingo.com/profile/CarlosLM.

This is all the data I can get from your courses right now (being your base language English). I've also made the sum of all your vocabularies. Your really an example to follow!! :-)


https://www.duolingo.com/profile/CarlosLM.

Mmm, I can only access one course of English of yours (provided you do NOT have English as your base language). For example, if your primary language is Portuguese, I could get only the vocabulary size of the English from Portuguese course. If your base language is Italian, the number from the English from Italian course and so on. I think I am going to look for your id, and then you yourself can look for all the numbers you want.


https://www.duolingo.com/profile/CarlosLM.

Ok, your link would be

https://www.duolingo.com/vocabularies/size?user_id=32462972

You can add \&language=en to the link to change the languages. But the easiest way in your case, it's that you change manually to all the courses you want to know your current vocabulary size.


https://www.duolingo.com/profile/WulfgarGoodread

Ok, so playing around with it, here is what I get:

English for Portuguese Speakers - 1415

Spanish for Portuguese Speakers - 1571

English for Dutch Speakers - 1397

English for Italian Speakers - 1397

French for Spanish Speakers - 1848

Portuguese for Spanish Speakers - 1942

One of these days I will re-finish my English for Spanish Speakers tree. I will do several other ladders too, someday.


https://www.duolingo.com/profile/CarlosLM.

Many thanks, I've updated the vocabulary table with your info.


https://www.duolingo.com/profile/CarlosLM.

Ok, your current vocabulary size for Norwegian is 3140, and for your Italian is 1808 (and I could go on and on looking for all your courses from English). For your English I get a 0, because right now you have English as your base language!

PS Your current course when I was checking was Norwegian from English, and I had to use the parameter & language=it, to know your Italian vocabulary size. :)


https://www.duolingo.com/profile/jaiirapetjan

Very interesting Carlos! I got my user I.D> 3332448. Can you tell me where I would rank in your Hall of Fame? I was pleased to see six of my friends there.


https://www.duolingo.com/profile/Thomas.Heiss

Hi Julia,

of course I can not tell you how you would rank in Carlos' special top (vocabulary) list, but you can take a look at your ranking on the (S)HOF new vocabulary top user list: www.duolingo.eu/words

You were included with 10920 words in the Bronze 10k+ group.


https://www.duolingo.com/profile/CarlosLM.

I think this is good news. Now, for absolutely everything you can get in Duolingo, there's a special ranking... :-) I was specially motivated for owls and lingots, but now I would also like to increase my global vocabulary (well, actually these are only side motivations, my main one is to improve my main languages, and to explore the rest of them).


https://www.duolingo.com/profile/CarlosLM.

Sorry, but you wouldn't rank at all in my HoF, because it's made by hand using the Google docs spreadsheet and special links, and controlling over 50 users is no longer fun for me. At this right moment it would be required to have over 30,000 lexemes only from the English courses ( I can't control the lexemes gotten from the rest of base languages because it would be too time consuming).

Anyway my ranking possibly does not make any longer sense, having the most comprehensive http://www.duolingo.eu/words. So my current plan is to publish an update this summer (in June), if there isn't a special ranking for the English courses, and see if people like it ;-)


https://www.duolingo.com/profile/garpike

The HoF words ranking appears to include courses from all base languages, so probably contains a lot of repeated vocabulary, especially from English, Spanish and French-from-XYZ courses. I think your ranking only from English courses is rather more scientific.


https://www.duolingo.com/profile/CarlosLM.

I haven't thought of that, but I think you're absolutely right. In a sense the two rankings would be complementary... At the moment I'll wait, maybe the author of the page will make separate vocabulary rankings for every major language.


https://www.duolingo.com/profile/garpike

maybe the author of the page will make separate vocabulary rankings for every major language.

That was quick...!


https://www.duolingo.com/profile/CarlosLM.

Yay, I think every conceivable and dreamed ranking is now there!!!!! Many thanks garpike for pointing this out! ;-)


https://www.duolingo.com/profile/Thomas.Heiss

Carlo's "Current vocabulary sizes of most courses in Duolingo (Update after 8 months)" thread from one month ago: https://forum.duolingo.com/comment/27444256


https://www.duolingo.com/profile/piguy3

Turkish and Ukrainian only quite recently introduced bonus skills. This probably explains the dearth of highest results. For instance, my Ukrainian count is 1095. I have done the Holidays bonus skill but not Idioms (which has two lessons to Holidays's one)


https://www.duolingo.com/profile/CarlosLM.

This is terrific info! So if people want the complete vocabulary for Ukrainian and Turkish, they must buy all the bonus skills!! :-) That explains the tiny difference between some users and others. There are ones that usually buy the bonus lessons, and others than not.


https://www.duolingo.com/profile/piguy3

Having completed all Ukrainian bonus skills, my word count is, indeed, 1127.


https://www.duolingo.com/profile/CarlosLM.

Update3 English course vocabulary sizes. For more information about courses, see the latest post of FieryCat


https://www.duolingo.com/profile/FieryCat

You may add my data directly in your post, if you want.


https://www.duolingo.com/profile/CarlosLM.

IMO it's easy to find your post, and it's very well formatted, so being a little lazy I prefer not to touch my original post any more. Thanks again for your valuable contribution! :-)


https://www.duolingo.com/profile/MyaRexa

This is great! @CarlosLM. Thank you very much for this :)

I've been wondering for some time how many actual word roots are in the courses, as I noticed that declinated forms are considered a separate word. Good example is Esperanto: there were originally only 900 word roots (modern vocabulary got greatly expanded, it has already been more than doubled), but the word list gives a much higher number.

Just checked my own lexemes count for all the languages (from en, pl and jp) and was greatly suprised... a total of 11095! That's about 138 new words a day. How and when did I learn that?!

There's en-from-pl count missing on your list, so as I completed the course, here's my count of lexemes: 1397 - same as for Russian :)

PS. Do you, by any chance, know some trick for accessing the "word list" in the courses in which it's not available? I know the "switching languages" method, but maybe there's some easier way... it's a bit difficult to do that on mobile, as websites sometimes refresh themselves when I'm short on RAM. ;)


https://www.duolingo.com/profile/CarlosLM.

Thanks for the lexeme count info, I've updated the post. Regarding the "Words tab", I don't know any special trick, apart of trying to make the link by hand: www.duolingo.com/words, but if there wasn't a tab, I think I've always found the 404 error. Anyway I'm not much interested in learning isolated words, as I'm much interested in learning the grammatical structures by practicing them. If a forget a word I don't worry, if it's important eventually I'll will learn it. And about mobile Duolingo, sorry, I don't have a smartphone (only an old mobile without internet), so I only use the web version.


https://www.duolingo.com/profile/piguy3

English from Ukrainian also has 1397.

The to-English trees I've done all share this count it seems. I think they're the same words and in the same skills, although the sentences do seem to differ.


https://www.duolingo.com/profile/CarlosLM.

Updated Ukranian and 2 English trees info.

Yes, I think that the English from Slavic languages trees share all the same vocabulary, that must be very similar to the rest of English trees, except the English for Spanish speakers tree which is clearly an outlier.


https://www.duolingo.com/profile/MyaRexa

Thank you for the reply, nonetheless :) I found an easier way to find out user's id - find on site "distinct_id", there should be only one. At least it is for my /users/nicknamehere site.


https://www.duolingo.com/profile/CarlosLM.

For me the fastest way to get an user_id is (*):

  • Over the user's avatar image, right click, and then click on "Inspect"

  • Look for the avatar link and copy the long number, that's the user_id

Alternatively you can (*):

  • Over the user's avatar image, right click, and then click on "Open link in new tab" (1st option)

  • Go to the new tab, and again over your avatar image, right click and the click on "Open image in new tab" (1st option)

  • Go to the new tab, and copy the long number from the address bar, that's the user_id

(*) you need Chrome, and that the user has a custom avatar. Use the method that best suits to you.


https://www.duolingo.com/profile/MyaRexa

Thanks for the tips! I'm mainly on mobile now, unless I post longer discussions, so I mostly operate on modifying links.

Checked: The avatar method works on mobile too :)

https://www.duolingo.com/vocabularies/size?language=all : this also works for seeing own lexeme count in languages using currently used display language, without knowing one's own id.


https://www.duolingo.com/profile/CarlosLM.

Yes, the default user_id is your own user_id!! ;-)


https://www.duolingo.com/profile/piguy3

Those are cool little tricks! I've normally gone to the users page and then just done Ctrl-f for "user_id". The new tabs one does seem marginally speedier.


https://www.duolingo.com/profile/FieryCat

@ICarlosLM.

I was thinking if it's possible to get the user_name from the user_id.

Yes, it is possible. This link will always point out to your account by your user_id.


https://www.duolingo.com/profile/CarlosLM.

This has been my preferred method to use in forums. ;-)) Anyway, I was thinking if it's possible to get the user_name from the user_id. Some people change their names, and you could lose track of them and get a 404 error. However I think that you can't change your user_id (unless you open a new Duolingo account)


https://www.duolingo.com/profile/CarlosLM.

Thanks a lot for your "magic", FieryCat, it works!!! Let me give you 10 lingots!! :D


https://www.duolingo.com/profile/FieryCat

You're welcome and thanks!


https://www.duolingo.com/profile/bookrabbit

What I would most like to know is the vocabulary one can obtain from doing every tree involving a language, in both directions. I have learned lots more Russian and French vocabulary, for example, by getting French from Russian up to level 20. A measure of how many lexemes this has added to the base I got from the from English courses would be very interesting.


https://www.duolingo.com/profile/piguy3

The method here is accessing something from the structure of the tree as laid out in the Incubator. I have a suspicion for myself that the lexemes here might not actually be fully unique, for instance if verb forms are introduced before the point in the tree at which all forms of e.g. present simple regular verbs are considered known based on the introduction of any single form. Maybe e.g. subjunctive also has to be entered as a separate "lexeme" so that just by learning indicative forms the system isn't automatically assuming you actually know subjunctive, too.

Otherwise stated, a lexeme in the target language is a certain kind of structure that contributors have to set up in the Incubator. The system has a way to count them, and that's what's being displayed using this method. The base language for the tree, however, is just whatever is needed to translate sentences put together with these lexemes. I don't think there would be any built-in way to measure the content of the base-language sentences.


https://www.duolingo.com/profile/CarlosLM.

I checked myself the accuracy of the lexeme count for the first words of the French tree and I was very happy with it (20-30 first lexemes, more or less). However, as humans are not perfect, there must be a margin of error in every lexeme count, so you could take the Duolingo lexeme counts as approximations of the real lexeme count (on top of that, your actual tree maybe is not the largest version available on Duolingo, as there are many A/B tests)

Anyway I don't mind an error of +-50 lexemes, as my first intent was to have an idea of how large are the different trees in Duolingo. Plus IMO no matter how well you learn the respective vocabularies, it's clear that there will be always thousands of remaining native words to learn if you want to achieve a high level in a language in the real world.


https://www.duolingo.com/profile/CarlosLM.

From the interface I only obtain lexeme counts. If I could obtain the actual lists of lexemes of every tree, it would be easy for me to merge lists, eliminate duplicated lexemes and find out how many new lexemes you get with a reversed tree, or another combination of trees. Maybe this kind of info is only accessible by the staff. Incubators must have access to the lexeme list of their own tree, but I highly doubt they can have direct access to other trees, for security reasons.

So I'm sorry not being able to help you with your very interesting question. Maybe FieryCat could help you, as he has advanced programming skills...


https://www.duolingo.com/profile/MyaRexa

I just noticed a curious thing:

(https://i.imgur.com/NvdurqD.jpg)

Look under "known_lexemes", each appears to have it's own id assigned. You'd have to use the text from users who actually completed the tree and it shows only for the language currently used (checked switching between Esperanto and Japanese, because I completed those two), but maybe it would be possible to compare those numbers and eliminate duplicate ones to get more accurate lexeme count? The ids' amount is the same as the number of listed lexemes taught.

Also, they may have the same id's between different courses teaching the same language, which would make it possible to eliminate duplicates is someone learns, for example, English from multiple languages.

Edit: checked for Ja->En and Pl->En for "colors" vocab, they seem to have the same ids in different courses. I don't know if this will be of any use for you, but I'll just leave it posted it here ;)


https://www.duolingo.com/profile/piguy3

It looks like the id's are shared across courses teaching the same language at least in certain instances. There are also some of the words taught themselves (Ctrl-f for "words" with the quotation marks), but it looks like only ten per skill no matter how many lexemes are taught.


https://www.duolingo.com/profile/CarlosLM.

I don't remember what was the problem, but I wasn't able to figure out the total lexeme count from the known lexemes ids. Maybe it was simply too much work for me. You might keep investigating on this if you feel like.

Anyway, in this year my intention is only to work on my "Hall of Fame based on total vocabularies", as it is relatively easy for me to get the data, but I would probably raise the bar to 20,000 or 25,000 lexemes of total vocabulary for courses based on English.


https://www.duolingo.com/profile/piguy3

The ids' amount is the same as the number of listed lexemes taught.

On a skill-by-skill basis this is true. However, it may bear noting that the total lexeme count for a course is distinct from the sum of "num_lexemes" figures. For example, in English from Dutch the course total is 1397, but the sum of the skill lexeme counts is 1952, which is even a couple hundred more than the total listed my the Words list for the course at the moment.


https://www.duolingo.com/profile/MyaRexa

I meant only the section count and the amount of the "xyz" ids listed in it. Thanks for the notice though! If I have time, I'll take a closer look at this. It's probable that some lexemes appear in multiple skill sections. I was thinking of something among the lines of copying the ids and then running a duplicate test on it, removing duplicates and checking how the count compares to the count for course total.


https://www.duolingo.com/profile/CarlosLM.

Yes, there must be a lot of duplicates. However I think that the real problem is that the ids are from distinct words, not from lexemes. I vaguely remember something like that, from the work I did here 4 months ago.


https://www.duolingo.com/profile/piguy3

It looks like there are even separate id's for homonyms at least. For instance, the word "one" appears in both the Numbers skill and the Determiners skill. However, none of the id numbers from the Determiners skill is occurs more than once on the page.


https://www.duolingo.com/profile/Elizabeth528513

So, I think using the string above I get 2962 for German from English?

https://www.duolingo.com/vocabularies/size?user_id=107405875

And yet you are saying the total words for German are only 2137? Duo Words page gives me 3350, which is all word forms, I get that. But what is the 2962? Then Duome has lexemes 2864 and words 2962.

How are these all for the same person? So confused!!!


https://www.duolingo.com/profile/ofou_

What about an update?

Learn a language in just 5 minutes a day. For free.