Trying to Understand the Algorithm better, late 2016
I am a big fan of Duolingo and have used it almost every day for the past nearly four years, trying to keep three completed trees gold while working a fourth. It is not, however, the most transparent company about the specifics of how it functions or changes over time. One area where this was the case has long been the workings of its algorithm for space repetition learning.
This changed with the publication of its academic article on its new Half-life Regression (HLR) model in the late summer of 2016 ("A Trainable Spaced Repetition Model for Language Learning" by Burr Settles and Brendan Meeder), and the publication of some code related to its algorithm on Github (both of these available at: https://github.com/duolingo/halflife-regression). This not only revealed that after testing, all users were switched over to its HLR model, but 1) that they had some initial issues with this switch when they overly "negatively weighted" certain "difficult" words leading to reports of unnecessarily rapid decay of some vocabulary and 2) that prior to this new algorithm, they had been using a Leitner System of spaced repetition, which is a very basic spaced repetition system, rather than some of the more complex algorithms deployed in more advanced spaced repetition systems like Supermemo or Anki.
This is a wonderful new development, but still leaves a lot of points unclarified, much of this understandable since Duolingo is a commercial enterprise that probably wants to guard against some of its competitors. For longer term users trying to figure things out for their own study purposes, however, we have been left to speculation about the details in a host of different locations (Duolingo forums, Reddit, blog entries, wikis etc.) about some key characteristics of the process that I think it would be useful to compile in a single place. These include answers to the following questions which I welcome input on:
1) what do we know about the complete set of parameters in the "interaction features" and current status of its use of "lexeme features" to modify the algorithm (see 3.4 in their academic piece on the explanation of the role of these features in the algorithm). I have seen a host of different claims about this in different places, including claim that your time-to-answer is one parameter, or timed/not-timed mobile/not-mobile have impact (or do not have impact) - none of this being mentioned in the article.
2) my own experience with maintaining three completed trees gold after a full year (and in case of two trees, over two years), suggests there is still an unreasonable decay rate on simple lexemes that I have not gotten wrong in months or years (leading to need of reviewing 0-6 lessons, or average of 2 lessons per day per tree). I am beginning to suspect there a maximum/capped interval on lexemes. Is that the case, or is there merely a highly inefficient slow growth of interval at higher ranges? A cap would not make sense from a pedagogical standpoint as there is a lot of "wasted" review on words that are nowhere near forgotten, but would serve Duolingo from a general active user "retention" perspective (force users to keep practicing more on the site than they otherwise would) but especially punishes highly active users (including myself) who want to maintain several complete trees (reviewing only genuinely at-risk lexemes and grammatical topics) while spending most of their Duolingo time on new trees.
3) What do we know (as of late 2016) for certain about the relationship between word decay (individual bars for a word in the "Words") and lesson/topic decay (bars on a lesson icon). Connected to this, what can we establish about the relationship between reviewing vocab cards vs reviewing decayed lessons (again, as of late 2016: I understand this has changed over time)
This is a really interesting idea. I'm trying to work out where on the Wiki the results or your investigation belong. I wondered whether it was on either of these two pages:
...but I think it might need to be on a new page called Halflife Regression.
Thanks for this. There is some great material on that /Strength wiki page (I had forgotten the aspect of different types of mistakes having a different impact), especially the good footnotes to some very interesting posts here by Burr Settles, who co-authored the academic article and surely knows the answers. I did not notice those before on visits to the wikia pages.
Among the most revealing things were:
1) a post (2014 some time) where he notes: "The skill strengths are simply the average of the individual words/concepts in that skill." - something I suspected but didn't really know. If still true late 2016, that is a useful tidbit to add to skill strength info on the wikia page.
2) several notes from 3 or 2 years ago which suggest that there were a number of "bugs" in implementation either general to the algorithm, specific to a user, or specific (in one case) to a platform (Android). This suggests that already 3 years ago, they were doing something at least somewhat more complex than a straightforward Leitner method which, given how easy it is to code, would be less likely to produce "bugs" of the nature described by users.
As you point out Hugh, the new info in the August paper and code on the github may merit its own page, and some updates to the /Strength page as well, perhaps linking to it for more detail. The formula is complex, but some of its key features would be worth outlining there.