Google Translate adds gendered stereotypes when translating from Turkish
Google Translate is using stereotypes to decide which pronouns to generate in gender ambiguous Turkish sentences. Woops?
So, just a heads up for learners who are using Google Translate to help with your studies, I don't know if this is only happening for Turkish translations, but, at the very least, it is happening for Turkish translations:
Google Translate adds gendered stereotypes when translating from Turkish, which doesn’t mark gender in these sentences: “o” means both “she” and “he”.
Google Translate favours the version that perpetuates a whole bunch of stereotypes – stereotypes that were, no doubt, present in the training data. (source, additional commentary, article with more information about how AI learns sexism and racism).
After reading the article, I decided to run a test of my own. Here are the results:
Well, what would you expect? As long as there are stereotypes in the population (source data), the machine will reflect it.
The post was intended to be commentary on GT's source data. I didnt realize people would think I understood the situation as anything other than that.
So... what's wrong with GT's source data? As far as I know, there are more male engineers, so it's natural there's more data with "he" form. It's not stereotypes but probability. :)
Of the linguist-enthused persons here, I'm of the sociolinguistics branch. So, I am interested in the social impact on language, and language's impact on the social. Stereotypes delivered via language have social impact. Language is not neutral, so it matters when this sort of thing is happening.
The problem is, there's nothing "natural" about there being more male engineers, even if the current state of things is that there are more male engineers. I spent 9 years in university. I have a degree in Sociology (gender focused) and another degree in Human Communications (gender and language focused).
Stereotypes are a social barrier. Along with other socially-based gender-biases put pressure on people to feel "out of place" in fields generally associated with a different gender. On the flip side, And, it emboldens people (generally men) whose gender matches expectations for that field to express their biases in attempts to drive (usually women) out of their field. Language is a dominant vehicle for stereotypes. It helps normalize the idea of their "naturalness". So, I'm in favor of weeding out this sort of thing out.
Edited for clarity
Finnish does not have separate pronouns for males and females. Instead of he or she, we use hän, which refers to both men and women. I gave google translate some sentences with hän and a noun referring to a profession or an occupation.
”Jobs for men”
- police officer
- prime minister
”Jobs for women”
- factory worker
- security guard
These seem pretty random, so whoever programmed the AI on how to use Finnish, did not make the same mistakes as the Turkish programmer.
Perhaps less biased but still biased. We have a tendency to indicate the gender of a person having a job that does not suit the traditional gender stereotypes by using prefixes. A female police officer is often called naispoliisi and a male nurse miessairaanhoitaja. Admittedly, these prefixes are getting rarer and rarer but there is still one word that always and without exception requires a prefix to tell us that we are talking about a man: miesprostituoitu, male prostitute. I was rather surprised that gt chose the pronoun he for a prostitute.
Turkish learners will already know the 'o' doesn't specify a gender. How exactly would you have Google translate it—randomly, or using probabilistic analysis of data? ('He is a cleaner', 'he is a secretary', 'she is a teacher' and 'he-she is a police [sic.]' seem (to me) antithetical to stereotypes anyway, and many others are borderline, so the whole premise is questionable on such a small data-set.)
If you'd prefer 'singular they' in such instances, incidentally, then this would debase the value the translation, as Turkish does specify the number, but a 'singular-they' rendering would leave this ambiguous in English.
Translating a pronoun like 'o' will always be context-based; there are more male than female soldiers and engineers, for example (I'm not so sure about nurses), so such stereotypes already reflect the balance of probabilities (which is why they are stereotypes in the first place). A few stereotypes do go against probability, but the AI can only work with the data it is given, so you shouldn't be blaming Google, but rather everyone who uses it and contributes to its data-pool.
Good luck learning Turkish, incidentally!†
†This sentence suddenly looks exceedingly sarcastic upon re-reading (perhaps my mind is overly coloured by the difficulty of Turkish); it is not intended to be.
Even though I'm not a native speaker, I'm an advocate for the singular "they". Just like Turkish learners know that 'o' is gender neutral, English speakers can and should be able to differentiate singular gender neutral 'they' and its traditional plural counterpart.
I do agree though, that Google Translate is not an evil sexist machine, just a reflection of our society's flaws.
It is completely impossible to distinguish singular and plural 'they' in sentences such as 'they are happy' or 'they are married', without context (which entering a single sentence into Google translate does not provide). This is not a problem in Turkish. As this complaint is predicated on typing isolated sentences into Google translate, I cannot see how Google could possibly differentiate number short of reading the mind of the person entering the sentence.
Context is the single factor that stops us from being able to correctly differentiate sing. and plu. 'they', making this discussion an exception to its proposal as a gender neutral pronoun in general. Given the context, it is quite efficient That being said, here's the solution:
When you type 3rd person singular pronouns in English, it translates to 3rd person singular pronouns in Turkish. That's because we can be sure you're talking one individual.
As I see the gender neutral 'they' is a linguistic mechanism that allows us to express uncertainty or irrelevance(epicene) rather than absence upon someone's gender, rather than a neutral-characteristic pronoun like <оно>, that'd be closer to our 'it' imo. So when you type "they" in English, it translates to 3rd person plural. If one wants a traslation without a specific gender, typing 'he/she/it' would get the same results I guess.
When you type "o bir mühendis", it returns something like 'They are a doctor'. This is a grammatical gray area for what I've found, it could also be 'He/she/it is a doctor', 'He/she/they is/are a doctor' (if specifying gender fluid and non-binary individuals is your problem; it also assures singular with the a).
I mean, if we're talking about changing something, why not think it through and make it work? Looking at it realistically, the problem is just that there is still no consensus.
If usage changes, no doubt GT will change, as GT is essentially a scientific instrument that reflects usage (albeit with many flaws, which I'm sure will improve, but reflecting actual usage is an essential prerequisite of being able to translate it accurately). You are welcome to argue in favour of changing usage, but you will be going against the grain in prescribing rules that do not reflect the reality of speech and text. This is like theologians arguing how many angels can dance on the head of a pin whilst the overwhelming majority will still carry on as they always have regardless.
> prescribing rules that do not reflect the reality of speech and text
The grammatical concept of epicene is surely not out of our reality. Why would my solution, specifically, lack in a accurate depiction of written/spoken English? I am an English speaker and I use the singular 'they' sometimes, as many others like me.
> This is like theologians arguing how many angels can dance on the head of a pin
No, it is not. Theologians have absolutely no proof of the concrete existence of angels, let alone if they dance in the head of a pin. For the sake of grammar, sometimes we indeed don't know/don't care about the concept of gender, and for the sake of this century, we do have gender minorities that advocate for the use of 'they'.
This discussion is deeper than a simple prescriptivism/descriptivism argument. Really, everyone should just make an effort and try to see beyond this dicotomy.
The trouble with your solution is it causes ambiguities of number as I said above, and I won't argue in favour of a usage that reduces the amount of information conveyed. The problem could equally be overcome if people could be persuaded to regard 'he' as sex-indefinite in such situations (which has worked perfectly well for centuries), rather than use a form that has ambiguous number. The only reasons for which some people oppose this are political rather than linguistic.
I really don't think it is deeper than descriptivist/precriptivist, as you admit to being a partisan in this debate and proffer reasons like 'for the sake of this century'; I'm sure Bishop Loweth would have similarly considered rigid adherence to Latin grammatical principles in English to be 'for the sake of the 18th century', as the appreciation of Latin poetry was very much the fashion of the time and some people will always push fashions too far. Now, Lowth was mildly successful, and perhaps you will be too, but Google should reflect reality rather than trying to coerce change, or it simply becomes another Lowth, and I feel that Google is much too potent and unnaccountable a tool to be given such latitude.
I entirely agree that theologians cannot prove anything scientifically. This, at least, gives them lots of common ground with the practitioners of 'gender studies'. I don't want to see either of these groups imposing dogmata on a software tool that should operate with scientific impartiality according to the evidence of real-world data.
I'm actually glad to find someone that argues in such a deep abstract and dialectic level. I do pick a side, but not because of prescriptivism. As I see, the way we describe our language (pointing out incoherencies with the actual world) deeply impacts on how we seek to change it, or not. The same goes the other way round: the way we change it (or try to) deeply impacts on how we see it. So yes, from a descriptivist point of view, we do have people that want this pronoun to fall in common usage, and because of that, it is getting more popularity over time. And yes, from a prescriptivist point of view, we should change it so it gets more popularity, and thus it integrates in our language. If there were no activist groups in the first place, it wouldn't really be that strong of a movement, I think.
What I wish to truly understand is why some people just feel they should go against it because it is 'not natural'. Much alike people that think that just because a drug is synthetic/semi-synthetic, it necessarily has a higher neutoxicity profile, when it's just not true. The concept of naturality is much discussed in contemporary philosophy, and as one of its most proeminent thinkers, Sloterdijk, argues, it's an outdated dicotomy (I really recommend reading about his thesis). That's because everything we see as natural was, at some point, "created" or "invented" and then became a general trend in nurturing, and this is not limited to tools and technology, but also core anthropological concepts such as families and avoiding incest.
So, in a short sentence, everything we now think may be 'just a trend' could be 'naturally' integrated in society. Kind of like "if you pretend to be something for enough time, you'll eventually become it". That's why naturality is a shallow notion in contemporary sociology/anthropology.
Centuries ago, there were much less people and the average person was not aware of modern concepts like genders and the politically correct. Now we have more than seven billion people in a relatively messed up world, so it's not surprising to see things changing and people willing to change them.
Just to clarify, each one of us has their one way of analysing linguistic phenomena. While Usagiboy may resort to sociological concepts you may seek a strictly pragmatic point of view, and that's really no big deal. It only shows us why this is deeper than one single dicotomy: there is no right and wrong until grammar says so.
My solution does not create ambiguity because whenever the number of people is bigger than one, the noun is in the plural. If not, then it is singular. Simple. It's a pretty straightforward way of recognising numbers. Also, I'm not saying we should just go out and change GT right away. I believe this would only be plausible once the singular 'they' is in broad usage. Yes, it does reflect reality because the people that designed it are from this same reality. I merely explained how it could work with 'they' if it ever comes to be the case.
I don't care if your opinions on genders are conservative. You have every right to worry about something. But you don't have the right the say genders are not real (maybe I misunderstood you, if so, I apologise), because they are! I'm not part of any activist group, as I try to see things in the least biased way possible, and the meaning of "for the sake of this century" is, bluntly:
Please stop using your vital force to conservatively fighting new sociological concepts. Try to understand them and you'll notice many causal relationships of today's general dialectic disagreement and a narrow-minded point of view (really not saying it's your case, I don't even know you). Another side effect of seeking to understand is that you'll feel less fear of supposedly dogmatic impositions. These minorities are not trying to take away things from us, they just want recognition and respect, as we have with all the so-called """"normal"""" people.
The job of a translator like Google is surely to express the maximum information possible within the constraints of the two languages being translated. After all, it is a tool for people who do not know e.g. Turkish and cannot verify the correct sense themselves. In the case of 'you are happy', ambiguity of number is unfortunately unavoidable unless Google resorts of archaisms, but this is not really an advertisement for introducing extra ambiguity in the third person when Google is already translating these sentences more accurately at the moment, alleged stereotypes notwithstanding. Doing so would reduce the utility of the service for no good reason.
If linguistics is about descriptivism rather that prescriptivism (which distinction is very much the modern mindset), then GT is reflecting exactly that, and any stereotypes anyone might perceive in it are a result of the vast corpus of real usage that the AI uses to hone its craft; it is not making value judgements. If it were to start using 'singular they', then I'm sure people could find English sentences that get translated into Turkish as either singular or plural at which they could take offence for some reason or another.
Exactly - and sometimes the maximum amount of information doesn't include the gender of the person to whom "o" refers.
Translating "o" as "[he or she]" would be using the maximum amount of information instead of pretending to have even more information than that.
Think about it. Suppose someone doesn't know Turkish, has a Turkish text that includes something like "Lisa is my sister. She is an engineer," and uses Google Translate.
It's much, much easier for this person to put the Turkish text into Google Translate, receive "Lisa is my sister. [He or she] is an engineer," and change the "[He or she]" to "She" than for the programmers at Google to get Google Translate to keep track of which words (including names) in previous contexts gender which pronouns.
Now if this person who doesn't know Turkish hands in a final draft at school or work that still has "Lisa is my sister. [He or she] is an engineer," in it instead of "Lisa is my sister. She is an engineer," then this person has done the assignment badly and it's not anyone else's problem. ;)
@lizsue I entirely agree, and I'd not object to at all to 'he or she' (although what happens when Google is translating something in which a Turk says the equivalent of 'he or she' to emphasise ambiguity of gender?—but this is likely to be a far more infrequent occurrence compared with ambiguity of number caused by use of 'singular they')
The point I was making is that Google's using either 'he' or 'she' for isolated sentences produces entirely accurate translations. Gender is either unimportant or obviously discernible by context (even if Google translates your engineer sister as 'he', you can easily appreciate this as an obvious error without any knowledge of Turkish grammar), whereas ambiguities of number caused by 'singular they' translation would cause far more mistranslations in the 'he/she is happy'/'they are happy' line.
psionpete, I tweeted Google, posted this in the forums here to draw attention to the matter. Some of the course contributors have ducked in and read this post. Now they are discussing it. Some users are relying on Google Translate as a secondary learning tool. Some of whom use GT to back up their assertion that their answer is correct and the course's answer is wrong. So, my best answer is that information is useful, I don't know if it will change Google's approach. But, it can shed light on a few things for others.
I ran this test and got simply "an engineer" with no modifier, and in the wierdo world of Google Translate I got "an asian" instead of "a cook."
I think the take-away for me - as always with Google Translate - is that it is unreliable. if you expect it to be even coherent, much less gender-role aware, well then I have a nice bridge to sell you...
Interesting! My results matched the results others have gotten. I doubt it was the Tweet I sent to Google. But, it would be cool if it was!
I'm sad the course mods havent joined the public convo on this. They had some comments i found interesting.
That's only because it could be either he or she, and those are the most 'common' translations. It translates to it, too, but we English speakers don't refer to people as 'it', and singular 'they' is not commonplace yet.
Same when translating from Hungarian:
Interesting. One of the course contributors suggested that GT uses different engines for different language. They were talking about which language results were better than others due to this.
Not sure that being a cook is considered particularly feminine at this point. I honestly think it might even be the reverse at this point, with so many male kitchen celebrities out there.
This week I was complaining about an issue I was having with an architect in my community (who is a woman) with the head of our museum (who is also a woman) when the museum head said "well HE needs to..." assuming this architect was male. I found it interesting the president of the neighborhood was complaining to the head of the museum about an architect--all three of us accomplished women in positions of leadership, and that same sexist mistake still came up. Evolving as a society is stupidly hard, and this silly AI reflects that. It's a good mirror, though, and can indicate where we are and how far we still have to go. Pretty sad though. :-/
Evolving as a society is stupidly hard.
Indeed. As they say, it takes a village to raise a child. Unfortunately, when the village we keep raising kids in continues to be and reflect sexism (among other things) (which it is), evolving away from those things as a society will continue to be, as you've masterfully put it, stupidly hard.
In Hebrew, the word “you” is gendered (male is אתה and female is את).
Here’s how Google Translate deals with some “you” sentences:
You are a man. אתה גבר. You are a woman. את אישה. You are a doctor. אתה רופא. You are a nurse. אתה אחות. (!) (Interesting, given that the word for “nurse” is a feminine noun). You are a teacher. אתה מורה. You are a secretary. את מזכירה. You are a lawyer. אתה עורך דין. You are emotional. אתה רגשי. You are logical. אתה הגיוני. You are kind. אתה נחמד. You are crying. אתה בוכה. You are married. אתה נשוי. You are single. אתה רווק. You cook dinner. אתה מבשל ארוחת ערב. You are feeding the baby. אתה מאכיל את התינוק.
It seems that the masculine form is the default for most ambiguous situations, with “you are a secretary” being the exception.
Who really cares though? Why is this an issue? It becomes more of an issue when you purposely change an algorithm because you don't want to offend anybody. Is it really that bad that women truck drivers aren't included in a language translation program? We're being just a little too sensitive.
It's about accuracy. Translating "o" in Turkish as "[he or she]" in English (when the computer isn't given the rest of the context to know who the "o" refers to and what that person's gender is) is more accurate than just translating it as "he" or "she".
Then, once given a sentence such as "[He or she] is a cook," the person doing the Turkish-to-English translation can fine-tune it by changing the "[He or she]" to "he," "she," "they," "[Name]," or whichever is relevant since this person would know the context more than the computer would.
Now, if the person relies on Google Translate enough to leave "[He or she] is a cook" in a document exactly as is, that's the problem of the person relying on Google Translate too much.
In fact, this could also make it easier for teachers. Suppose "O bir mühendis" is in a paragraph that students have to translate from Turkish to English. Seeing "[He or she] is an engineer" in the homework someone hands in makes it even clearer that the student cheated on the homework by relying on Google Translate so much. ;)
I watched a full series of Leliko and I was introduced to only the 4 main characters of the show: Leli, Liko, Kulabuz, and Pembik. Some of these Turkish episodes were deleted by the way, but the first robot they meet which comes out in the first episode's name is Kulabuz. Now I went all this time thinking that Kulabuz was male. Then when Leli meets Pembik, I automatically assume that's a girl and this is the whole Turkish version of it. However, whenever I watch Leliko in English, Kulabuz is referred to as a "she" instead. It drove me crazy in my mind silently. I didn't know what to expect. It made me think that Liko is the only male character in the show and in fact that all the rest of the characters of Leliko are female because there are only exactly four in there when that is exactly not the case.
So in the English Leliko Episode 34, they actually anglinized Kulabuz to Bellabuzz to make it sound feminine. In the beginning, just 8 seconds before the 2 minute mark, Liko says to Leli, "Hey, where do you think Bellabuzz is?" Leli responds, "I'm not sure. I haven't seen her, but she can't be too far." A few seconds later, they hear music being played. Leli says, "What beautiful music! Can you hear it too?" Liko says, "Yes Leli! What could it be? Are you guys out there hearing the music?" A YEAH is heard! Leli says, "Where could it be coming from?" Then Liko says, "Look Leli! It's Bellabuzz! What is she doing?" And that's it!
As a little side note, Bellabuzz almost sounds like "Belle" in the first syllable, and we all know that character from Beauty and the Beast and Belle is female, in which who Gaston wants her to fall in love with him, but she actually finds love and compassion with the beast because Gaston is so naive and also believes that women shouldn't read, and the beast had to find kindness or make love from his heart by his 21st birthday, or the spell would make him a humongous beast where he would stay for eternity. Only if he found love by his 21st birthday, the spell would be broken and all the characters he froze for over 10 years would come back.