Behind the Scenes of Duolingo's Engine Rewrite (January 2017)
I found an article by André Kenji Horie, a Senior Software Engineer at Duolingo, that I think many of you would find interesting: Rewriting Duolingo's engine in Scala.
Much of the article goes into detail about Duolingo's system design and how the programming language Scala works, so for those of you that aren't interested in that, here is a summary of the benefits of the engine rewrite:
• Redesigned architecture
• Refactored code from Python to Scala
• Latency dropped from 750ms to 14ms
• Engine uptime increased from 99.9% to 100%
Rewriting code is a complicated, but necessary process. Even though it halts the development of new features and may take several months, eventually, the technical debt that has built up must be addressed.
In case you're not familiar with the term, technical debt is like financial debt. You "borrow money" by making engineering decisions that let you develop something quickly. In the long term, though, development starts to stall because of these accumulated shortcuts, at which point it's time to pay it off.
Session Generator* has existed since day one of Duolingo. It has lived through all the pains that come with the rapid growth of a startup, and, unsurprisingly, had built up a lot of technical debt.
*Session Generator is Duolingo's "backend module which gets data from one of our 88 language courses (and counting!) in the Duolingo Incubator, sprinkles some machine learning magic, and proceeds to serve a sequence of exercises tailored to the needs of each of our millions of users".
Hah! Yeah I didn't understand a lot of the article even though I am a computer science major (but I am just starting out).
Unfortunately they have said nothing about the old "Words" tab, the old API (still compatible with lekz Android "Review flashcard" DuoLingo app), if that module/service was re-written to Scala, and why it (the user database of "learned words") contains so many bugs.
This includes an inside how the /vocabulary/overview stream is published and why 3-4 grammar skills (incl. all words) are missing in my Portuguese tree, which makes the usage of the available user script "DuoLingo skill strength viewer" not as powerful as I would like to have it.
This is the real stuff "behind the scenes", which would be interesting to me, how one's user database is updated, added (new words) or how the new Scala backend services select weak/not practiced words from a user database (how does the select algorithm work?).
On Memrise I can at least click on the wiki or check my own course / level backlogs to see, when words will be repeated according to it's a little bit more transparent spaced repetition algorithm.
I would like to find out, why I have so many many "dead words (strength 0)" in my tree, even I practice Portuguese DAILY for one year.