Jakkal's Field Guide to Undisclosed Machine Translation


Mirrored from this Tumblr post, reformatted for a website. Please note that I am not a professional translator; I speak a fair amount of Japanese but not fluently. I am, however, a passionate hobbyist and a pedant.

Hello, general-you.

So, for the sake of argument, let's imagine that you are a fan of something that is only available in Japanese. Perhaps it is a mobile game, or perhaps it is one of those ridiculous nebulous "multimedia projects". Or maybe some other format, I don't know, this is the realm that I inhabit and I have completely lost sight of everything outside of it.

And you, unfortunately, do not speak Japanese. This is fine! There are many resources and fan-translators out there to help smooth over the bumps and make these silly, fun little media properties more accessible. But there is a scourge in our midst. Disingenuous, often misleading, and almost always dishonest, but only growing more prevalent with the march of technology:

Some dingus putting text into an autotranslator, maybe doing a pass to correct the name readings and a little bit of phrasing, and passing it off as their own translation.

And you may think: how can I identify this? I don't speak Japanese! I can't fact-check! But oh, you can! While you can't reverse-engineer meaning, you can at least go into MTL with a critical eye and know to take anything you read with a grain of salt.

So may I introduce to you my handy dandy, totally scientific, exhaustively researched, totally not 150% based on anecdata and vibes

Jakkal's Field Guide to Spotting Undisclosed Machine (Japanese-to-English ) Translation

First of all, let's just say up front: machine translation is, ultimately, a tool. It can be useful for getting the gist of things in a pinch. The problem I have is not when people use machine translation, or, hell, even when they post it, as long as they clearly state what they're doing.

What I mind is when people post machine translation, even cleaned-up machine translation, and do not disclose it as being MTL. It especially sands my nips when people then get very defensive and possessive of the "work" they've done-- and doubly so when they act like no further translations are necessary. It leads to misunderstandings, misinformation, and even more bullshit infighting than we usually get.

So here are some of my "favorite" tells. Note that a lot of these can also just be a sign of an extremely inexperienced or, frankly, bad translator, but they often make the same mistakes.

  • A general tendency for non-sequitur, or sentences that don't seem to follow each other. The subject changes mid-paragraph and the sentences appear to have no causal effect on each other. This is the big one. Machines struggle to identify context within a single sentence (more on that in this video), let alone on a sentence-by-sentence basis. This is the most obvious tell.
  • In general, struggling with descriptive or figurative language; MTL often suddenly turns into word salad whenever something is being described in anything but the most barebones way.
  • It also tends to completely word-salad-ify slang and idiomatic language. Basically, just look for anything that straight up does not make sense in English.
  • Things get awkwardly repeated, both on a phrase level (無無 turning into "no good, no good" is a common one), and on a sentence-by-sentence level (like repeating information instead of consolidating it).
  • It can't be helped. If you know, you know. (If you don't know: 仕方がない often gets very-literally translated to "it can't be helped". This is not an incorrect translation, per se, but it's not always the most appropriate. It can be parsed many ways, from "it was inevitable", "there's nothing to be done about it", "oh well", "we did the best we could", "if you insist", "it's not up to me", "what's done is done", "shit happens", "sometimes it do be like that"... This is almost always the tell of either MTL or a very inexperienced human translator.)
  • Thesaurus abuse word choice, where you can tell what it means but it does not track as natural phrasing (eg. "Aren't you being too radical?" instead of "Aren't you overdoing it?")
  • Frequent use of "that person", "at that time", "that day", or any other very vague phrasing that sounds natural in Japanese but very odd in English. "That [person/guy/whatever]..." is also common as an expression of exasperation.
  • Excessive use of a character's name (without the vocative comma) in direct address. You'll often see cases where Character A is talking to Character B with nobody else in the room, but it will be rendered as "B is really stuborn" instead of "You're really stubborn, B".
  • In really egregious cases, the POV changes (e.g. from third to first person and back again) without warning, or a character's pronouns will change for a sentence.
  • For prose (e.g. light novels) especially, no dialogue tags or any way to follow who is speaking in a conversation.
    • As you, as a discerning weeaboo who is in deep enough to be into JP-only franchises, are probably aware, Japanese has lots of words for "I". I've translated written work where it's immediately clear in Japanese who is talking without a single dialogue tag, entirely because one character uses uses "ore" written as 俺, one uses "boku" written as 僕, and one uses "boku" written as ボク. Reading it in Japanese, you know who's talking because you know that Character A is the only character who says 俺. But you just slap all of those into DeepL, they'll all spit out "I", and adding dialogue tags back in both necessitates noticing this and being able to distinguish who was talking in the first place.
  • Sentence fragments, especially verbs without a subject (or verbs that default to "I", especially when the rest of the text is in third person).
  • Indistint character vhoice, including but not limited to:
    • Every character having the exact same voice and word choice. You kind of have to get a feel for this, but once you notice it, you can't stop seeing it.
    • Awkward, character voice breaking slang. Especially with the advent of DeepL, slang no longer completely immediately breaks the machine translation, but if every character speaks the same way whenever informal language comes out (eg both a 15 year old girl and a 35 year old man calling food "delish"), that's a tell.
  • Consistent hyper-literal wording even for non-euphemistic language. For instance: 消える usually means "disappear", but it can also refer to lights going out. MTL will often miss this context and might translate it as "the light disappeared" instead of "the lights went out" or even "it went dark".
  • The inability to carry a sentence's momentum across line breaks-- if you're dealing with lyrics or poetry, or even two characters finishing each other's sentences, then every line ends up being completely independent from the last.

There are other tells -- if you've seen enough DeepL translations, you know it has a very particular cadence and voice. Hell, I've even seen cases where, comparing against the original text, I can tell a problem happened with the OCR they were using to capture the text and it misread a character; but I think that's probably getting a bit too into the nitty gritty for the average non-Japanese speaking reader.

Almost all of these come down to one thing: a machine translator cannot distinguish context, especially when working in a language like Japanese, where huge amounts of information is left up to implication on a sentence-by-sentence basis. The inability to correct for this facet of the language makes machine translation spin wildly out of control and become incomprehensible in record time. An inexperienced translator (or person who does not speak Japanese at all) trying to "clean up" a machine translation, A, will be unwilling or unable to read the original text to understand that context, and B, are often reticent to change or reword anything to re-add that context. This often seems to be borne out of the belief that a "literal" or "direct" translation is best or the least misleading.

But I would argue that in most cases, a "literal translation" is one being done by a translator not confident enough in their translation ability to create an equivalent experience.

(Also, lest it go unsaid: translation takes a really, really strong command of not only Japanese but of the target language as well. I've said multiple times that often, figuring out the meaning is only about 30% of the job, and the other 70% is trying to phrase it organically in English.)

So what should you be taking away from this?

If you suspect something is MTL, you can still use it to get the gist of something, but take all of the details with a grain of salt, especially if it's the only thing you've got.

And for the love of god, don't post MTL and pretend it's your own work. If you absolutely must post it, then please just... disclaim that DeepL did the work for you. It's really obvious.

Back to Translation