1. Forum
  2. >
  3. Topic: Duolingo
  4. >
  5. Suggestion: Import many(all) …


Suggestion: Import many(all) public domain books from Google Books / Other resource

I think one of the problems we have here is that is relatively little content to read/translate. Google books is a project that has many scanned books whose copyright has expired, and are currently in the public domain.

Excellent works such as for example books by Rudyard Kipling (The Jungle Book), One Thousand and One Nights, Shakespeare (Othello), and/or works by Luis Camoes (Os lusiadas), Hans Christian Andersen (The Emperor's New Clothes) as well as other literature are available, and would enrich us with considerable knowledge.

Perhaps the staff could negotiate with Google, Project Gutenberg, or other providers for a facility to import public domain books or snippets for learning and translating.

Edit: It may also serve as an excellent source for new sentences for duolingo Practice Since the work has been published and proofread by professionals, as well as edited considerable times.

September 23, 2013



I'm not sure that translating Shakespeare is a suitable task for a language learner -- even a lot of native speakers find him pretty hard going:

I prithee, Tom, beat Cut's saddle, put a few flocks in the point; the poor jade is wrung in the withers out of all cess.

And translating poetry doesn't tend to be easy since you have to do justice to the form as well as the content. Still, there are a lot of prose works on Project Gutenberg which would probably be suitable.


Is there a need for any of those books to be translated? I gotta think the works of Shakespeare have been translated into the languages Duo deals with. Duo makes money by selling translations, so if the works are already translated or there is no demand for them to be translated, it may be good practice for learners, but . . . . . ?


I only mentioned that as an example. Besides Duolingo will eventually expand to include languages we've probably never heard off, and perhaps in those languages Shakespeare has never been translated. Ever heard about Shakespeare in cuneiform, or isiZulu?

  • 2653

Most of the stuff being translated in immersion is non-commercial material, overwhelmingly Wikipedia articles, so it's mostly just practice already. It would be really nice if we had a broader selection of content. I don't think Shakespeare would be that good a choice, but other public domain books? You bet.


You can already do this. You can add public domain materials for free to the immersion section.

I don't think adding materials en masse would be a good idea, that would just overwhelm the translation capacity of Duolingo.


According to some, there are more than 10 million duolingo users, and perhaps more than 6000 articles in the French section or an estimate of more than 24000 (assuming each section has 6000 articles) articles in all sections. I don't see how exactly it will overwhelm so many users.

In addition, good books have already been translated in many languages by experts. Introducing them here would serve as a good practice for users, an opportunity to translate to other obscure languages, and a good comparison of expert translation versus crowd-sourced user translation.

In my opinion what needs to improve is the way Duolingo presents these articles for translation. Currently it seems like a hassle to go to the emersion section. If it was seamlessly added into normal practice such as randomly asking a user to translate during practice, it would be perhaps be better.


And there are over 40000 ebooks on Project Guttenberg alone, that's books not articles. Remember that not all 10 million users are active users and not all active users can effectively uses the Immersion section (many are still too early in their learning), moreover to produce high quality translation requires multiple users to converge into a particular translation.

If none of the text currently in Immersion attracts your interests, you can already add any public domain books into Immersion, including public domain books from Guttenberg. Importing all of Guttenberg's books en masse at the same time would just artificially inflate the number of text in Immersion; the worst side of importing en masse would be that the commercial documents would end up not getting translated because you can't see it over a large swath of these books and everyone is occupied with the books.


You make some valid points, but you have to realize that the aim of Duolingo is to translate the whole entire internet, so 40000 ebooks are just like a drop of water in a lake or perhaps in the ocean. Also, you raise questions of scalability, what would happen if just 10% of users decided to upload articles for translation?

In addition, commercial documents are prioritised over free stuff, so I don't see that as a problem.

I do however realize that the immersion section still needs a lot of work, currently we can't even search for articles using keywords, perhaps it is too soon for this idea to be implemented.


You're missing my point. By having users choose when to upload the books, the number of documents in Immersion will be growing organically at the pace in line with the growth of the number of Immersion users. Importing too many too fast will reduce the quality of translations; importing all of Guttenberg's books en masse especially at this point in time will severely disrupt this balance.


Well, you could be right. In any event we can't upload pdf, and other types of documents. So unless the book is in webpage format, this whole discussion is pointless until such facilities are available.


@Dessmator You can convert the book yourself.

Keep in mind that if there suddenly appeared thousands of books in the Immersion section, people would really struggle with finding something that interests them from so overwhelmingly large pile.


When trying to import some public domain books via Immersion section, Duolingo bot says it ain't suitable for all ages. Even children's books like Alice in Wonderland gave me these results.

This is important because there is no public domain translation of some books, like Alice in Wonderland, for portuguese, for instance. Since no big publishing corporation will freely release it's translation into public domain, Duolingo crowdsourcing translation could be a big step into setting culture free.

It's sad, because Duolingo focus it's crowdsourcing translation in commercial, such as Buzzfeed's content, missing the whole point of creating a community good that crowdsourcing have.


Duolingo has largely stopped any further development effort for immersion. So at some point they are likely to even stop translating Buzzfeed/Cnn content.

Anyway, there's a good chance that it has been translated by gutenburg.com. If not, you can submit a request to support@duolingo.com or community@duolingo.com asking them to allow the particular document in immersion. Sometimes the algorithm gets it wrong.

Learn a language in just 5 minutes a day. For free.