https://www.duolingo.com/CarlosLM.

Current vocabulary sizes of most courses in Duolingo

I was curious to know how many words each tree has, as a means to have an idea how long a course is. So I wrote this post:

https://www.duolingo.com/comment/24435667

There I learnt, that you should make a distinction between unique words (what you get from the tab Words) and base words or lexemes, which are the different entries in a dictionary. This second count is the most important, because it is independent from the different declensions or conjugations a language might have. Fortunately, Duolingo has this number available for you, and you can get it with this trick I got thanks to FieryCat

https://www.duolingo.com/vocabularies/size?user_id=24953571 (using my current id in this example)

However this trick was limited to use it. First of all, you needed to get an user_id (you can get one using: www.duolingo.com/users/"your Duolingo user name" and looking for the last "id" or "avatar" words), but you only got a parcial count if your tree was not complete. You could change your language to get the other counts, but you couldn't change the base language of one another user (that's good, because otherwise it would be hacking)

So I worked around this problem, trying different parameters and I got very lucky to find out that adding =language_code you could access all the accessible different counts without changing anything else. Yoohoo!!! :D

So now, if I am studying French from Spanish, you can get my Portuguese from Spanish count this way:

https://www.duolingo.com/vocabularies/size?user_id=24953571\&language=pt (please, copy and paste the whole link, this is the closest working link I can get with an ampersand :-/)

However, to get my English course counts I think I should change my base language to English. And all the same goes for other user_id, so you could tell more or less if an user with millions of XPs is a genuine language lover, or is simply a gamer (Tip: look for difficult languages, almost every heavy user completes the Spanish or French trees, but difficult or obscure languages like Hungarian, Hebrew, Korean, Swahili, Vietnamese or Guarani are another story)

Without further ado, these are the top vocabulary sizes I have found:

  • Norwegian(nb): 3265
  • Hebrew(he): 2676
  • Dutch(dn): 2630
  • Romanian(ro): 2342
  • Swedish(sv): 2228
  • English(en): 1388 (ja), 1397(dn, ru, uk, pl, tr, cs, ro), 1422(fr, de, it), 1440(pt), 2180(es)
  • German(de): 2137
  • Danish(da): 2120
  • Russian(ru): 2109
  • Hungarian(hu): 2060
  • Welsh(cy): 2051
  • Portuguese(pt): 1950
  • Greek(el): 1949
  • French(fr): 1935
  • Czech(cs): 1911
  • Esperanto(eo): 1878
  • Italian(it): 1808
  • Catalan(ca): 1795(es)
  • Polish(pl): 1769
  • Chinese(zs): 1739
  • Korean(ko): 1708
  • Irish(ga): 1644
  • Vietnamese(vi): 1643
  • Spanish(es): 1603
  • Guarani(gn): 1579(es)
  • Turkish(tr): 1393, 1423*
  • Swahili(sw): 1205
  • Ukrainian(uk): 1127
  • Japanese(ja): 1090
  • High Valyrian(hv): 589

(*) I've only found 1 user with so high vocabulary size.

Please, take into account that some trees have A/B tests, so maybe you cannot get the full count at the moment. OTOH, maybe you have to buy the bonus to get the full vocabulary of the course. If somebody knows it, please tell.

If I committed a mistake or typo, or another language or course is missing, please also tell to add or correct the info.

Cheers and happy learning!!


P.S. Update Great news, now you can get all the languages you are doing from your current base language, with the corresponding vocabulary sizes and even a total tally, using this link (copy and paste the whole link, and then change my user_id with yours):

https://www.duolingo.com/vocabularies/size?user_id=24953571\&language=all

Thanks again to FieryCat for this new trick.

Update2 This a mini "Hall of Fame" based on total vocabularies of people I follow (+10,000 lexemes):

Hall of Fame of Duolingo

Update3 English course vocabulary sizes. For more information about courses, see the latest post of FieryCat

September 20, 2017

85 Comments


https://www.duolingo.com/FrankKool

Thanks for sharing.

Also, isn't it odd that the most widely used course (Spanish, now with counting over a 100 million registered users) has such a small vocabulary?

September 21, 2017

https://www.duolingo.com/CarlosLM.

Yes, the Spanish vocabulary was set in stone long time ago, and I don't see any interest from Duolingo in improving the course. The only difference between users is that some have done bonus lessons and some not, and this only accounts for about 15 lexemes.

September 21, 2017

https://www.duolingo.com/Ontalor

There have actually been two dramatic updates to the course since Spanish was originally released years ago. The updates come every so often.

September 22, 2017

https://www.duolingo.com/CarlosLM.

Thanks for your interesting info! Keep in mind that I cannot know very old vocabulary changes, because I've only tracked very active users (plus a few users that have finished a tree recently) and they tend to keep updated the popular languages.

September 22, 2017

https://www.duolingo.com/FieryCat

I wrote and ran a small script to analize maximum vocabulary size (base language English) for the users from this list (994 users): Streak Hall of Fame Sign Ups. As soon as the data appears, I will publish them in order to compare.

September 20, 2017

https://www.duolingo.com/CarlosLM.

Wow, this is amazing!!! Could you select users with other base languages? I'm specially interested in getting the vocabulary sizes of the different courses on Duolingo to teach English, but it's sooo hard to get these data!!

September 20, 2017

https://www.duolingo.com/FieryCat

I improved my script a bit and got this:

From English (en)

  • Norwegian (Bokmål) (no-BO): 3265, finished: 100.0%
  • Hebrew (he): 2676, finished: 100.0%
  • Dutch (nl-NL): 2630, finished: 100.0%
  • Romanian (ro): 2342, finished: 100.0%
  • Swedish (sv): 2228, finished: 100.0%
  • German (de): 2137, finished: 100.0%
  • Danish (da): 2120, finished: 100.0%
  • Russian (ru): 2109, finished: 100.0%
  • Hungarian (hu): 2060, finished: 100.0%
  • Welsh (cy): 2051, finished: 100.0%
  • Portuguese (pt): 1950, finished: 100.0%
  • Greek (el): 1949, finished: 100.0%
  • French (fr): 1935, finished: 100.0%
  • Czech (cs): 1911, finished: 100.0%
  • Esperanto (eo): 1878, finished: 100.0%
  • Italian (it): 1808, finished: 100.0%
  • Polish (pl): 1769, finished: 100.0%
  • Irish (ga): 1644, finished: 100.0%
  • Vietnamese (vi): 1643, finished: 100.0%
  • Spanish (es): 1603, finished: 100.0%
  • Korean (ko): 1532, finished: unknown
  • Turkish (tr): 1423, finished: unknown
  • Swahili (sw): 1205, finished: unknown
  • Ukrainian (uk): 1127, finished: unknown
  • Japanese (ja): 1090, finished: 100.0%
  • High Valyrian (hv): 589, finished: 100.0%

From Spanish (es)

  • English (en): 2180, finished: 100.0%
  • Portuguese (pt): 1942, finished: 100.0%
  • Esperanto (eo): 1859, finished: unknown
  • French (fr): 1848, finished: 100.0%
  • Catalan (ca): 1795, finished: 100.0%
  • Italian (it): 1791, finished: 100.0%
  • German (de): 1710, finished: 100.0%
  • Guarani (Jopará) (gn): 526, finished: unknown

From French (fr)

  • German (de): 2138, finished: 100.0%
  • Portuguese (pt): 1957, finished: 100.0%
  • Italian (it): 1791, finished: 100.0%
  • Spanish (es): 1571, finished: 100.0%
  • English (en): 1422, finished: 100.0%

From Portuguese (pt)

  • French (fr): 1851, finished: 100.0%
  • Italian (it): 1800, finished: 100.0%
  • German (de): 1711, finished: 100.0%
  • Spanish (es): 1571, finished: 100.0%
  • English (en): 1440, finished: 100.0%

From Russian (ru)

  • French (fr): 1913, finished: 100%
  • German (de): 1710, finished: 100.0%
  • Spanish (es): 1571, finished: 100.0%
  • English (en): 1397, finished: 100.0%

From German (de)

  • French (fr): 1848, finished: 100.0%
  • Spanish (es): 1574, finished: 100.0%
  • English (en): 1422, finished: 100.0%

From Italian (it)

  • French (fr): 1913, finished: 100.0%
  • German (de): 1841, finished: 100.0%
  • English (en): 1422, finished: 100.0%

From Turkish (tr)

  • German (de): 1570, finished: 100.0%
  • English (en): 1397, finished: 100.0%

From Indonesian (id)

  • English (en): 1242, finished: 83.6%

From Greek (el)

  • English (en): 684, finished: 45.5%

From Japanese (ja)

  • English (en): 1388, finished: 100.0%

From Czech (cs)

  • English (en): 1397, finished: 100.0%

From Romanian (ro)

  • English (en): 1397, finished: 100.0%

From Polish (pl)

  • English (en): 986, finished: 65.5%

From Dutch (nl-NL)

  • English (en): 1397, finished: 100.0%

I hope this information will be useful to you.

September 28, 2017

https://www.duolingo.com/Olja.
  • 1285

Incredible. Thank you for the information.

September 28, 2017

https://www.duolingo.com/CarlosLM.

Wow, astonishing data!! Thanks a lot! :D :D However, you added another mysterious number, the finished part. How do you know this number??

September 28, 2017

https://www.duolingo.com/FieryCat

There is another trick, which allows to get a lot of useful data about a user: https://www.duolingo.com/users/user_name_here. This data contains information about the the course that the user is learning now.

September 28, 2017

https://www.duolingo.com/CarlosLM.

Aha, I Knew that, that link has very cool info. One hidden information I very much like to see are the individual streaks, you have one for every language... That way you can see what languages one user is currently working on, or you can know how many streak days you have with a certain tree or trees.

Another trick maybe you didn't know. Look at the link of the image of your avatar:

https://duolingo-images.s3.amazonaws.com/avatars/70993956/u-YDValmNM/xlarge

The long number is your user_id, and by hand it's a lot faster to get it from there using the Right Mouse Button (Over the avatar, in Chrome click "Open image in new tab"), than from the usual link https://www.duolingo.com/users/user_name_here.

September 28, 2017

https://www.duolingo.com/CarlosLM.

OTOH I think you should publish your data in a post yourself (if you haven't done yet), as they are much more complete that mine. With my manual method I can get my info quite easily from my best users, but only for the English courses. Again, thank you very much for your info. Now maybe I select one course instead of other based on tree size, and while I'm advancing in my current trees, I'm looking forward to confirming the full vocabulary sizes.

September 28, 2017

https://www.duolingo.com/FieryCat
  • Norwegian Bokmål(no-BO): 3265
  • Hebrew(he): 2676
  • Dutch(nl-NL): 2630
  • Romanian(ro): 2342
  • Swedish(sv): 2228
  • German(de): 2137
  • Danish(da): 2120
  • Russian(ru): 2109
  • Hungarian(hu): 2060
  • Welsh(cy): 2051
  • Portuguese(pt): 1950
  • Greek(el): 1949
  • French(fr): 1935
  • Esperanto(eo): 1878
  • Italian(it): 1808
  • Polish(pl): 1769
  • Irish(ga): 1644
  • Vietnamese(vi): 1643
  • Spanish(es): 1603
  • Czech(cs): 1540
  • Turkish(tr): 1393
  • Ukrainian(uk): 1107
  • Japanese(ja): 1090
  • Swahili(sw): 590
  • Korean(ko): 582
  • High Valyrian(hv): 286

The list size: 26
Number of users: 793

The script source code: https://pastebin.com/1RD5NN3g

September 20, 2017

https://www.duolingo.com/CarlosLM.

Great work FieryCat, your script is confirming my data that I got by hand. ;-)

For Korean I had 4 users confirming the high result Korean(ko): 1708****
for Swahili I had 6 users Swahili(sw): 1205******
and for High Valyrian only 2 users: 589**

For these difficult languages, sometimes what I did was to look for messages in the forum of the type: "XXX tree finished" And the rarest language of all Duolingo for me was Guarani, because I only could find one user with a high result: piguy3 with his 1579 of vocabulary size:

https://www.duolingo.com/vocabularies/size?user_id=183767404\&language=gn

September 20, 2017

https://www.duolingo.com/FieryCat

Korean(ko): 582

Most likely, "my" user has not finished his tree. I did not check that in my script. It is necessary to add such a check in the script or display the percentage of completion of the tree.

In fact, this information should be collected in all forums and not only in one thread.

September 20, 2017

https://www.duolingo.com/CarlosLM.

Yeah, this is a very difficult course to have finished because it was released very recently. The kind of users you checked are juggling 20+ trees, so unless they have previous Knowledge of Korean, it's really difficult to complete that tree in no time.

September 20, 2017

https://www.duolingo.com/piguy3

I found another user with a 1579 count for Guaraní. For some reason the method didn't work for a few other people whom I know to have finished the tree, however (ah, b/c I assume their accounts weren't set to Spanish base language at the relevant moment; mine wasn't either just now, but I guess it can be easier to check oneself).

September 22, 2017

https://www.duolingo.com/CarlosLM.

This is strange, I have gotten Catalan results from users that have their accounts set to English, but maybe for Guarani is different, and if the account is not set to Spanish, you cannot get the data. Anyway, thanks for your commitment to the Guarani language, I got the coveted Guarani vocabulary size! :D

September 22, 2017

https://www.duolingo.com/CarlosLM.

I have just given you 20 lingots for your hard work and very appreciated help, and because I don't like your downvotes at all. :-/

September 21, 2017

https://www.duolingo.com/FieryCat

Thanks. Could you tell me what kind of data you are interested in? In a few days, if I will have a spare time, I would updated my script and shared the data that you need.

September 21, 2017

https://www.duolingo.com/CarlosLM.

The vocabulary sizes of the most important English courses (e.g. en\<-de, en\<-ru, en\<-fr, en\<-pt, en\<-Arabic, en\<-Chinese, en\<-Japanese, en\<-Indonesian...) as my data is really scarce. If this is too much trouble, don't worry, maybe you could run your current script again 3 months later to see the vocabulary changes!! ;-)

September 21, 2017

https://www.duolingo.com/FieryCat

The script is already working but I can edit it later. The script will tell me how many users from that list were from English language. I think that there will be the majority of them.

September 20, 2017

https://www.duolingo.com/CarlosLM.

I don't understand why 2 people have downvoted you! Maybe it's the word "script" what has triggered an irrational response IMO. If we didn't have scripts, programs, or computer science in general, we wouldn't have Duolingo in the first place, this wonderful tool to help you learn foreign languages.

September 21, 2017

https://www.duolingo.com/FieryCat

No, It was my personal "fan". It seems he has finally discovered for himself the English branch of the forum :D Never mind.

September 21, 2017

https://www.duolingo.com/CarlosLM.

Cool! Looking forward to your results... ;)

September 20, 2017

https://www.duolingo.com/CarlosLM.

Update2 This a mini Hall of Fame based on total vocabularies of people I follow (+10,000 lexemes):

Hall of Fame of Duolingo

September 22, 2017

https://www.duolingo.com/FieryCat

Do you know that the parameter "language" can have value "all"?

September 22, 2017

https://www.duolingo.com/CarlosLM.

Wow, I had no idea!! Now you can do your personal total tally way easier than before!

September 22, 2017

https://www.duolingo.com/CarlosLM.

I appended an update to the post, including your info.

Btw, duonks is probably the number one polyglot in Duolingo, he has topped almost all the English courses and has a total tally of 46343. What an impressive feat!! :O

September 22, 2017

https://www.duolingo.com/FieryCat

Take a look at this user: "Olja.", she has total 444701 XP.

September 22, 2017

https://www.duolingo.com/CarlosLM.

Olja. is a really great duolinger, but he or she doesn't enter my top 3, based in total lexeme count (from English):

  • 1: duonks with 46343
  • 2: NohTaebin with 39631
  • 3: garpike with 38232

whereas Olja. has an outstanding total of 34376.

Note that my ranking is not exhaustive, it's only from users I follow.
Another observation is that after doing my vocabulary research, XPs has become rather meaningless for me. As far as I know, best duolingers has around half million points. Those who are in the millions, they usually go for very few languages, and I don't know if they are truely learning them. For me it's absurd to constantly review the same 2,000-2,500 lexemes... If you want to be fluent, maybe you need 10,000+ lexemes, and a further immersion in the language that Duolingo doesn't provide currently.

September 22, 2017

https://www.duolingo.com/piguy3

Those high single-language totals are generally from Immersion; I think that's usually the case for any number substantially above around 40,000.

September 22, 2017

https://www.duolingo.com/CarlosLM.

Yes, you are right, but if an user for example has Spanish, German, French and English, all of them with +500,000XP and no exotic language, I find them boring. And if they have a difficult language, most of them don't have finished the tree!! Whereas, users with less than 1,000,000XP I found that they usually study their languages. I don't say anything about keeping the trees gold, I'm only talking about finishing trees, because I cannot know if they keep them golden, or if they complement Duolingo with other resources.

September 22, 2017

https://www.duolingo.com/piguy3

How do you use the trick when you have more than one course into a target language? For instance, I have two for French but if my account is set to a third base language, I get a "0" for "fr".

EDIT: I assume you mean courses from English ;) OK

September 22, 2017

https://www.duolingo.com/CarlosLM.

Aha, if you have for instance these 3 hypothetical courses "fr from en", "fr from es", "fr from it", you should change your base language to English, then to Spanish, and then to Italian, to get the respective tallies. Or easier, I think if you change manually to those courses, you should get the tallies. Play around, and you should get all your tallies from all your base languages. Good luck, and feel free to ask for help if you get into a dead-end.

September 22, 2017

https://www.duolingo.com/rabbitrah

Yeeeeeees. Bro I love spreadsheets, you're an angel.

December 1, 2017

https://www.duolingo.com/CarlosLM.

you are welcome

December 2, 2017

https://www.duolingo.com/OliverBens7

I think it'd be useful to add the size of chinese course to your list as well. According to https://www.duolingo.com/vocabularies/size?user_id=24316963&language=zs it's 1739. I found this using RobinCard and frawaradR as users because they both seem to have completed the course according to these: https://www.duolingo.com/comment/25253210 https://www.reddit.com/.../thoughts_no_completing.../. I hope that helps :)

February 21, 2018

https://www.duolingo.com/CarlosLM.

Thanks, I've added the Chinese count to the list.

February 21, 2018

https://www.duolingo.com/AmareloTiago

If you have a way to check, I currently have my Golden owls in the English for Portuguese Speakers, English for Dutch Speakers, and English for Italian Speakers.

I used to have English for Spanish Speakers but they have extended that tree and I haven't got around to finishing the new lessons. I have a few trees like that; I have finished 20 different trees (some more than once) but some have been expanded and I haven't finished the new material, like Dutch and Norwegian for English Speakers. I also have golden owls for [Portuguese, Catalan and French] for Spanish Speakers and Spanish for Portuguese Speakers.

September 20, 2017

https://www.duolingo.com/CarlosLM.

This is all the data I can get from your courses right now (being your base language English). I've also made the sum of all your vocabularies. Your really an example to follow!! :-)

September 20, 2017

https://www.duolingo.com/CarlosLM.

Mmm, I can only access one course of English of yours (provided you do NOT have English as your base language). For example, if your primary language is Portuguese, I could get only the vocabulary size of the English from Portuguese course. If your base language is Italian, the number from the English from Italian course and so on. I think I am going to look for your id, and then you yourself can look for all the numbers you want.

September 20, 2017

https://www.duolingo.com/CarlosLM.

Ok, your link would be

https://www.duolingo.com/vocabularies/size?user_id=32462972

You can add \&language=en to the link to change the languages. But the easiest way in your case, it's that you change manually to all the courses you want to know your current vocabulary size.

September 20, 2017

https://www.duolingo.com/AmareloTiago

Ok, so playing around with it, here is what I get:

English for Portuguese Speakers - 1415

Spanish for Portuguese Speakers - 1571

English for Dutch Speakers - 1397

English for Italian Speakers - 1397

French for Spanish Speakers - 1848

Portuguese for Spanish Speakers - 1942

One of these days I will re-finish my English for Spanish Speakers tree. I will do several other ladders too, someday.

September 21, 2017

https://www.duolingo.com/CarlosLM.

Many thanks, I've updated the vocabulary table with your info.

September 21, 2017

https://www.duolingo.com/CarlosLM.

Ok, your current vocabulary size for Norwegian is 3140, and for your Italian is 1808 (and I could go on and on looking for all your courses from English). For your English I get a 0, because right now you have English as your base language!

PS Your current course when I was checking was Norwegian from English, and I had to use the parameter & language=it, to know your Italian vocabulary size. :)

September 20, 2017

https://www.duolingo.com/jairapetyan

Very interesting Carlos! I got my user I.D> 3332448. Can you tell me where I would rank in your Hall of Fame? I was pleased to see six of my friends there.

March 26, 2018

https://www.duolingo.com/Thomas.Heiss

Hi Julia,

of course I can not tell you how you would rank in Carlos' special top (vocabulary) list, but you can take a look at your ranking on the (S)HOF new vocabulary top user list: www.duolingo.eu/words

You were included with 10920 words in the Bronze 10k+ group.

March 26, 2018

https://www.duolingo.com/CarlosLM.

I think this is good news. Now, for absolutely everything you can get in Duolingo, there's a special ranking... :-) I was specially motivated for owls and lingots, but now I would also like to increase my global vocabulary (well, actually these are only side motivations, my main one is to improve my main languages, and to explore the rest of them).

March 26, 2018

https://www.duolingo.com/CarlosLM.

Sorry, but you wouldn't rank at all in my HoF, because it's made by hand using the Google docs spreadsheet and special links, and controlling over 50 users is no longer fun for me. At this right moment it would be required to have over 30,000 lexemes only from the English courses ( I can't control the lexemes gotten from the rest of base languages because it would be too time consuming).

Anyway my ranking possibly does not make any longer sense, having the most comprehensive http://www.duolingo.eu/words. So my current plan is to publish an update this summer (in June), if there isn't a special ranking for the English courses, and see if people like it ;-)

March 26, 2018

https://www.duolingo.com/garpike

The HoF words ranking appears to include courses from all base languages, so probably contains a lot of repeated vocabulary, especially from English, Spanish and French-from-XYZ courses. I think your ranking only from English courses is rather more scientific.

March 26, 2018

https://www.duolingo.com/CarlosLM.

I haven't thought of that, but I think you're absolutely right. In a sense the two rankings would be complementary... At the moment I'll wait, maybe the author of the page will make separate vocabulary rankings for every major language.

March 26, 2018

https://www.duolingo.com/garpike

maybe the author of the page will make separate vocabulary rankings for every major language.

That was quick...!

March 27, 2018

https://www.duolingo.com/CarlosLM.

Yay, I think every conceivable and dreamed ranking is now there!!!!! Many thanks garpike for pointing this out! ;-)

March 27, 2018

https://www.duolingo.com/Thomas.Heiss

Carlo's "Current vocabulary sizes of most courses in Duolingo (Update after 8 months)" thread from one month ago: https://forum.duolingo.com/comment/27444256

July 22, 2018

https://www.duolingo.com/piguy3

Turkish and Ukrainian only quite recently introduced bonus skills. This probably explains the dearth of highest results. For instance, my Ukrainian count is 1095. I have done the Holidays bonus skill but not Idioms (which has two lessons to Holidays's one)

September 22, 2017

https://www.duolingo.com/CarlosLM.

This is terrific info! So if people want the complete vocabulary for Ukrainian and Turkish, they must buy all the bonus skills!! :-) That explains the tiny difference between some users and others. There are ones that usually buy the bonus lessons, and others than not.

September 22, 2017

https://www.duolingo.com/piguy3

Having completed all Ukrainian bonus skills, my word count is, indeed, 1127.

January 27, 2018

https://www.duolingo.com/CarlosLM.

Update3 English course vocabulary sizes. For more information about courses, see the latest post of FieryCat

September 28, 2017

https://www.duolingo.com/FieryCat

You may add my data directly in your post, if you want.

September 28, 2017

https://www.duolingo.com/CarlosLM.

IMO it's easy to find your post, and it's very well formatted, so being a little lazy I prefer not to touch my original post any more. Thanks again for your valuable contribution! :-)

September 28, 2017

https://www.duolingo.com/MyaRexa

This is great! @CarlosLM. Thank you very much for this :)

I've been wondering for some time how many actual word roots are in the courses, as I noticed that declinated forms are considered a separate word. Good example is Esperanto: there were originally only 900 word roots (modern vocabulary got greatly expanded, it has already been more than doubled), but the word list gives a much higher number.

Just checked my own lexemes count for all the languages (from en, pl and jp) and was greatly suprised... a total of 11095! That's about 138 new words a day. How and when did I learn that?!

There's en-from-pl count missing on your list, so as I completed the course, here's my count of lexemes: 1397 - same as for Russian :)

PS. Do you, by any chance, know some trick for accessing the "word list" in the courses in which it's not available? I know the "switching languages" method, but maybe there's some easier way... it's a bit difficult to do that on mobile, as websites sometimes refresh themselves when I'm short on RAM. ;)

January 27, 2018

https://www.duolingo.com/CarlosLM.

Thanks for the lexeme count info, I've updated the post. Regarding the "Words tab", I don't know any special trick, apart of trying to make the link by hand: www.duolingo.com/words, but if there wasn't a tab, I think I've always found the 404 error. Anyway I'm not much interested in learning isolated words, as I'm much interested in learning the grammatical structures by practicing them. If a forget a word I don't worry, if it's important eventually I'll will learn it. And about mobile Duolingo, sorry, I don't have a smartphone (only an old mobile without internet), so I only use the web version.

January 27, 2018

https://www.duolingo.com/piguy3

English from Ukrainian also has 1397.

The to-English trees I've done all share this count it seems. I think they're the same words and in the same skills, although the sentences do seem to differ.

January 27, 2018

https://www.duolingo.com/CarlosLM.

Updated Ukranian and 2 English trees info.

Yes, I think that the English from Slavic languages trees share all the same vocabulary, that must be very similar to the rest of English trees, except the English for Spanish speakers tree which is clearly an outlier.

January 27, 2018

https://www.duolingo.com/MyaRexa

Thank you for the reply, nonetheless :) I found an easier way to find out user's id - find on site "distinct_id", there should be only one. At least it is for my /users/nicknamehere site.

January 29, 2018

https://www.duolingo.com/CarlosLM.

For me the fastest way to get an user_id is (*):

  • Over the user's avatar image, right click, and then click on "Inspect"

  • Look for the avatar link and copy the long number, that's the user_id

Alternatively you can (*):

  • Over the user's avatar image, right click, and then click on "Open link in new tab" (1st option)

  • Go to the new tab, and again over your avatar image, right click and the click on "Open image in new tab" (1st option)

  • Go to the new tab, and copy the long number from the address bar, that's the user_id

(*) you need Chrome, and that the user has a custom avatar. Use the method that best suits to you.

January 29, 2018

https://www.duolingo.com/MyaRexa

Thanks for the tips! I'm mainly on mobile now, unless I post longer discussions, so I mostly operate on modifying links.

Checked: The avatar method works on mobile too :)

https://www.duolingo.com/vocabularies/size?language=all : this also works for seeing own lexeme count in languages using currently used display language, without knowing one's own id.

January 29, 2018

https://www.duolingo.com/CarlosLM.

Yes, the default user_id is your own user_id!! ;-)

January 29, 2018

https://www.duolingo.com/piguy3

Those are cool little tricks! I've normally gone to the users page and then just done Ctrl-f for "user_id". The new tabs one does seem marginally speedier.

January 29, 2018

https://www.duolingo.com/FieryCat

@ICarlosLM.

I was thinking if it's possible to get the user_name from the user_id.

Yes, it is possible. This link will always point out to your account by your user_id.

January 29, 2018

https://www.duolingo.com/CarlosLM.

This has been my preferred method to use in forums. ;-)) Anyway, I was thinking if it's possible to get the user_name from the user_id. Some people change their names, and you could lose track of them and get a 404 error. However I think that you can't change your user_id (unless you open a new Duolingo account)

January 29, 2018

https://www.duolingo.com/CarlosLM.

Thanks a lot for your "magic", FieryCat, it works!!! Let me give you 10 lingots!! :D

January 29, 2018

https://www.duolingo.com/FieryCat

You're welcome and thanks!

January 29, 2018

https://www.duolingo.com/bookrabbit

What I would most like to know is the vocabulary one can obtain from doing every tree involving a language, in both directions. I have learned lots more Russian and French vocabulary, for example, by getting French from Russian up to level 20. A measure of how many lexemes this has added to the base I got from the from English courses would be very interesting.

January 28, 2018

https://www.duolingo.com/piguy3

The method here is accessing something from the structure of the tree as laid out in the Incubator. I have a suspicion for myself that the lexemes here might not actually be fully unique, for instance if verb forms are introduced before the point in the tree at which all forms of e.g. present simple regular verbs are considered known based on the introduction of any single form. Maybe e.g. subjunctive also has to be entered as a separate "lexeme" so that just by learning indicative forms the system isn't automatically assuming you actually know subjunctive, too.

Otherwise stated, a lexeme in the target language is a certain kind of structure that contributors have to set up in the Incubator. The system has a way to count them, and that's what's being displayed using this method. The base language for the tree, however, is just whatever is needed to translate sentences put together with these lexemes. I don't think there would be any built-in way to measure the content of the base-language sentences.

January 28, 2018

https://www.duolingo.com/CarlosLM.

I checked myself the accuracy of the lexeme count for the first words of the French tree and I was very happy with it (20-30 first lexemes, more or less). However, as humans are not perfect, there must be a margin of error in every lexeme count, so you could take the Duolingo lexeme counts as approximations of the real lexeme count (on top of that, your actual tree maybe is not the largest version available on Duolingo, as there are many A/B tests)

Anyway I don't mind an error of +-50 lexemes, as my first intent was to have an idea of how large are the different trees in Duolingo. Plus IMO no matter how well you learn the respective vocabularies, it's clear that there will be always thousands of remaining native words to learn if you want to achieve a high level in a language in the real world.

January 28, 2018

https://www.duolingo.com/CarlosLM.

From the interface I only obtain lexeme counts. If I could obtain the actual lists of lexemes of every tree, it would be easy for me to merge lists, eliminate duplicated lexemes and find out how many new lexemes you get with a reversed tree, or another combination of trees. Maybe this kind of info is only accessible by the staff. Incubators must have access to the lexeme list of their own tree, but I highly doubt they can have direct access to other trees, for security reasons.

So I'm sorry not being able to help you with your very interesting question. Maybe FieryCat could help you, as he has advanced programming skills...

January 28, 2018

https://www.duolingo.com/MyaRexa

I just noticed a curious thing:

(https://i.imgur.com/NvdurqD.jpg)

Look under "known_lexemes", each appears to have it's own id assigned. You'd have to use the text from users who actually completed the tree and it shows only for the language currently used (checked switching between Esperanto and Japanese, because I completed those two), but maybe it would be possible to compare those numbers and eliminate duplicate ones to get more accurate lexeme count? The ids' amount is the same as the number of listed lexemes taught.

Also, they may have the same id's between different courses teaching the same language, which would make it possible to eliminate duplicates is someone learns, for example, English from multiple languages.

Edit: checked for Ja->En and Pl->En for "colors" vocab, they seem to have the same ids in different courses. I don't know if this will be of any use for you, but I'll just leave it posted it here ;)

January 29, 2018

https://www.duolingo.com/piguy3

It looks like the id's are shared across courses teaching the same language at least in certain instances. There are also some of the words taught themselves (Ctrl-f for "words" with the quotation marks), but it looks like only ten per skill no matter how many lexemes are taught.

January 29, 2018

https://www.duolingo.com/CarlosLM.

I don't remember what was the problem, but I wasn't able to figure out the total lexeme count from the known lexemes ids. Maybe it was simply too much work for me. You might keep investigating on this if you feel like.

Anyway, in this year my intention is only to work on my "Hall of Fame based on total vocabularies", as it is relatively easy for me to get the data, but I would probably raise the bar to 20,000 or 25,000 lexemes of total vocabulary for courses based on English.

January 29, 2018

https://www.duolingo.com/piguy3

The ids' amount is the same as the number of listed lexemes taught.

On a skill-by-skill basis this is true. However, it may bear noting that the total lexeme count for a course is distinct from the sum of "num_lexemes" figures. For example, in English from Dutch the course total is 1397, but the sum of the skill lexeme counts is 1952, which is even a couple hundred more than the total listed my the Words list for the course at the moment.

January 29, 2018

https://www.duolingo.com/MyaRexa

I meant only the section count and the amount of the "xyz" ids listed in it. Thanks for the notice though! If I have time, I'll take a closer look at this. It's probable that some lexemes appear in multiple skill sections. I was thinking of something among the lines of copying the ids and then running a duplicate test on it, removing duplicates and checking how the count compares to the count for course total.

January 29, 2018

https://www.duolingo.com/CarlosLM.

Yes, there must be a lot of duplicates. However I think that the real problem is that the ids are from distinct words, not from lexemes. I vaguely remember something like that, from the work I did here 4 months ago.

January 29, 2018

https://www.duolingo.com/piguy3

It looks like there are even separate id's for homonyms at least. For instance, the word "one" appears in both the Numbers skill and the Determiners skill. However, none of the id numbers from the Determiners skill is occurs more than once on the page.

January 30, 2018
Learn a language in just 5 minutes a day. For free.