https://www.duolingo.com/galliumarsenide

Bug: Correct Vietnamese input and no Unicode normalization gives incorrect response

I think duolingo does not perform Unicode normalization on my input strings.

I'm using KDE and the Vietnamese keyboard layout.

During a timed practice session, I found that translating "Man" as "Đàn ông" (or any other prompt with diacritics), I got an "almost correct" response.

Here's the text duolingo responded with as correct: Đàn ông

Here's the text I entered: Đàn ông

The problem is my à is in fact two characters: an 'a' and a combining grave accent. Duolingo's à is just one character.

I ran the following in a python shell on both strings above:

import unicodedata

[unicodedata.name(c) for c in u"Đàn ông"]

['LATIN CAPITAL LETTER D WITH STROKE', 'LATIN SMALL LETTER A WITH GRAVE', 'LATIN SMALL LETTER N', 'SPACE', 'LATIN SMALL LETTER O WITH CIRCUMFLEX', 'LATIN SMALL LETTER N', 'LATIN SMALL LETTER G']

[unicodedata.name(c) for c in u"Đàn ông"]

['LATIN CAPITAL LETTER D WITH STROKE', 'LATIN SMALL LETTER A', 'COMBINING GRAVE ACCENT', 'LATIN SMALL LETTER N', 'SPACE', 'LATIN SMALL LETTER O WITH CIRCUMFLEX', 'LATIN SMALL LETTER N', 'LATIN SMALL LETTER G']

Notice my input contained "LATIN SMALL LETTER A" and "COMBINING GRAVE ACCENT" while duolingo's string contained just "LATIN SMALL LETTER A WITH GRAVE".

I ran my input through a Unicode normalizer using NFC normalization and the output ended up the same as Duolingo's correct string.

2 years ago

1 Comment


https://www.duolingo.com/Mr.rM
Mr.rM
  • 25
  • 15
  • 15
  • 14
  • 14
  • 12
  • 3
  • 1407

With the custom fonts used by Duolingo, I can see the difference too:

“Đàn” (a + ˋ) and “Đàn” (à)

2 years ago
Learn Vietnamese in just 5 minutes a day. For free.