Wednesday, November 30, 2011

10 Points Before you start your localization..

I am mentoring the localization tasks of Haiku into Tamil for Google Code-In 2011, and hence thought of providing a few suggestions for localizations. Some of these suggestions will be specific to Tamil, while sharing a few common characteristics with other languages.

1) Use the standard terminology
Make sure that you have the necessary reference and the language's latest accepted technical glossary with you. Don't invent your own words or phrases. If you don't know a word, leave it blank, rather than filling it with your guesses.

If you find a word not in the glossary, try to find the meaning from the other reliable sources. If you have found a translation for a word, make sure the translation matches the standard. If an acceptable translation for a phrase is first found, share that with the other team members, and with their approval consider using the word in the translation. Words that are found not in the glossary should be noted down and later can be included in the Glossary.

Systems such as HTA, expect the localizations to be verified by the language maintainer or the mentor, before marking the translations as verified. That is, a translated word can be marked as faulty, by the language mentors. 

2) Be consistent. 
For example, I notice the use of "ஜன்னல்" and "சாளரம்" interchangeably, for the same context. Pls stick to one. In this case, my recommendation is to use "சாளரம்". Don't ignore the existing conventions.

3) Don't use slang or spoken/broken language
Words like "இங்க" and "ஓடுது" are a very slang way of translation, and are grammatically wrong. Please use formal Tamil. Not any spoken variant of Tamil. We will reject the spoken forms of phrases, which are considered wrong in written format.

If something is considered wrong in your Tamil lessons, they are wrong in localization too. We can't get broken or grammatically wrong localizations with wrong spellings into the project. :)

4) Translate as phrases
The phrases should be translated as a whole, and not as word-by-word.

Let's take the phrase, "Update time interval:"
It should be translated as, "மேம்படுத்தல் நேர இடைவெளி" and not "மேம்படுத்தல் நேரம் இடைவெளி". This is something that differentiates the Indic languages from English.

Don't translate word-by-word. Instead, translate by complete phrases. Phrases like, "Add graph" should be translated as a whole in Tamil. Phrases like "சேர்க்கவும் (add) வரைபடம் (graph)" or "வரைபட சேர்க்கவும்" are not grammatically complete, and any native Tamil speaker can point that. It should be "வரைபடத்தைச் சேர்க்கவும்".

"Do you want to stop" should be translated as "நிறுத்த வேண்டுமா?" (want to stop?), instead of "நீ நிறுத்த வேண்டுமா?". Here we omit, "நீ", as that is obvious.

5) Translate for the context.
Some words may have different meanings according to the context. Be careful when localizing them. "Them" may not be "அவர்களை" when it refers to the plural of "it". It should be "அவற்றை".

"written by:" should be "எழுதியவர்:". "எழுதப்பட்டது" doesn't make sense in this context.

Think of,
"written by:Raja"
"எழுதியவர்:ராஜா" will be natural.
"எழுதப்பட்டது ராஜா" doesn't make sense.

So translate for the context. Do not translate as it is.

6) Be respectful to the user
Pls do not use "நீ". Use "நீங்கள்" instead. Similarly, don't use "நிறுத்து". Should be "நிறுத்தவும்". The program should refer to the user in a respective manner. We should not offend the user, by calling him in "singular", as the rule of Tamil.

7) Locales
Be specific to the correct locale. If you are translating for ta-LK, consider the conventions involved, and remember this can be different from ta-IN. Some projects do not have the locales. They just have the country code, ignoring the potential minor changes between the locales.

8) Don't translate the control strings
For example, leave the strings such as,
%lld ms
as it is.
Don't try to introduce blank space between these. Translations such as
% lld நொடி
% lld MS
are invalid.
Don't try to introduce blank space between the %lld.
Also, there is no need to transliterate units such as MB, as we use them as standards. Translating it as எம்பி doesn't make sense.

9) Don't just "Google Translate"
For example,
"CPU Usage" should be translated as "CPU பயன்பாடு"
where it has been translated as,
CPU Usage = CPU பயன்பாட்டை by Google Translate.

Google Translate is using a learning algorithm, and is not always correct. Moreover, it is not complete for Indic languages such as Tamil. Please translate on yourself, since we mark those Google Translated phrases as "Faulty", as most of them can be translated using better vocabulary.

10) Easy translations first
There may be a few phrases that you may not be able to translate. Focus on the phrases that you can translate easily first, than struggling with long phrases that may take more time for you to translate.
P.S: This post is an updated version of a post that was written a long time back.


  1. Making up new words is sometimes fun. I remember when I was translating my feed reader to Esperanto, I was wondering what I should use. Should I esperantize feed (I came up with fedo)? Should I use the logical eraro (set of items, but which also could be error)? I settled for kanalo (channel, Netscape's original term). Only later did I realize that most people were using fluo (flow). I read that IE's Spanish translation uses fuente (English: spring/well; Esperanto: fonto). That's obviously a very idiomatic thing.

  2. Nice to know that we have had similar experience. Sometimes I feel, I could better use a word other than the community accepted or the widely used terminology, but always settled with the one that is commonly used, instead of going for a discussion.


You are welcome to provide your opinions in the comments. Spam comments and comments with random links will be deleted.