Showing posts with label Indic Text. Show all posts
Showing posts with label Indic Text. Show all posts
Saturday, December 3, 2011
Tuesday, November 29, 2011
10 Points Before you start your localization..
I am mentoring the localization tasks of Haiku into Tamil for Google Code-In 2011, and hence thought of providing a few suggestions for localizations. Some of these suggestions will be specific to Tamil, while sharing a few common characteristics with other languages.
10) Easy translations first
There may be a few phrases that you may not be able to translate. Focus on the phrases that you can translate easily first, than struggling with long phrases that may take more time for you to translate.
1) Use the standard terminology
Make sure that you have the necessary reference and the language's latest accepted technical glossary with you. Don't invent your own words or phrases. If you don't know a word, leave it blank, rather than filling it with your guesses.
If you find a word not in the glossary, try to find the meaning from the other reliable sources. If you have found a translation for a word, make sure the translation matches the standard. If an acceptable translation for a phrase is first found, share that with the other team members, and with their approval consider using the word in the translation. Words that are found not in the glossary should be noted down and later can be included in the Glossary.
Systems such as HTA, expect the localizations to be verified by the language maintainer or the mentor, before marking the translations as verified. That is, a translated word can be marked as faulty, by the language mentors.
2) Be consistent.
2) Be consistent.
For example, I notice the use of "ஜன்னல்" and "சாளரம்" interchangeably, for the same context. Pls stick to one. In this case, my recommendation is to use "சாளரம்". Don't ignore the existing conventions.
3) Don't use slang or spoken/broken language
Words like "இங்க" and "ஓடுது" are a very slang way of translation, and are grammatically wrong. Please use formal Tamil. Not any spoken variant of Tamil. We will reject the spoken forms of phrases, which are considered wrong in written format.
If something is considered wrong in your Tamil lessons, they are wrong in localization too. We can't get broken or grammatically wrong localizations with wrong spellings into the project. :)
4) Translate as phrases
The phrases should be translated as a whole, and not as word-by-word.
Let's take the phrase, "Update time interval:"
It should be translated as, "மேம்படுத்தல் நேர இடைவெளி" and not "மேம்படுத்தல் நேரம் இடைவெளி". This is something that differentiates the Indic languages from English.
Don't translate word-by-word. Instead, translate by complete phrases. Phrases like, "Add graph" should be translated as a whole in Tamil. Phrases like "சேர்க்கவும் (add) வரைபடம் (graph)" or "வரைபட சேர்க்கவும்" are not grammatically complete, and any native Tamil speaker can point that. It should be "வரைபடத்தைச் சேர்க்கவும்".
"Do you want to stop" should be translated as "நிறுத்த வேண்டுமா?" (want to stop?), instead of "நீ நிறுத்த வேண்டுமா?". Here we omit, "நீ", as that is obvious.
5) Translate for the context.
Some words may have different meanings according to the context. Be careful when localizing them. "Them" may not be "அவர்களை" when it refers to the plural of "it". It should be "அவற்றை".
"written by:" should be "எழுதியவர்:". "எழுதப்பட்டது" doesn't make sense in this context.
Think of,
"written by:Raja"
"எழுதியவர்:ராஜா" will be natural.
"எழுதப்பட்டது ராஜா" doesn't make sense.
So translate for the context. Do not translate as it is.
"written by:" should be "எழுதியவர்:". "எழுதப்பட்டது" doesn't make sense in this context.
Think of,
"written by:Raja"
"எழுதியவர்:ராஜா" will be natural.
"எழுதப்பட்டது ராஜா" doesn't make sense.
So translate for the context. Do not translate as it is.
6) Be respectful to the user
Pls do not use "நீ". Use "நீங்கள்" instead. Similarly, don't use "நிறுத்து". Should be "நிறுத்தவும்". The program should refer to the user in a respective manner. We should not offend the user, by calling him in "singular", as the rule of Tamil.
7) Locales
Be specific to the correct locale. If you are translating for ta-LK, consider the conventions involved, and remember this can be different from ta-IN. Some projects do not have the locales. They just have the country code, ignoring the potential minor changes between the locales.
8) Don't translate the control strings
For example, leave the strings such as,
%lld ms
as it is.
Don't try to introduce blank space between these. Translations such as
% lld நொடி
and
% lld MS
are invalid.
Don't try to introduce blank space between the %lld.
Also, there is no need to transliterate units such as MB, as we use them as standards. Translating it as எம்பி doesn't make sense.
9) Don't just "Google Translate"
For example,
"CPU Usage" should be translated as "CPU பயன்பாடு"
where it has been translated as,
CPU Usage = CPU பயன்பாட்டை by Google Translate.
Google Translate is using a learning algorithm, and is not always correct. Moreover, it is not complete for Indic languages such as Tamil. Please translate on yourself, since we mark those Google Translated phrases as "Faulty", as most of them can be translated using better vocabulary.
10) Easy translations first
There may be a few phrases that you may not be able to translate. Focus on the phrases that you can translate easily first, than struggling with long phrases that may take more time for you to translate.
P.S: This post is an updated version of a post that was written a long time back.
Saturday, August 8, 2009
Summer of Codes - Wrapping up
Google Summer of Codes 2009 is reaching its finishing touches. I should thank Google for organizing the event and Abiword for mentoring me. 'Porting Abiword for windows to Unicode', is my project. My focus was, in simpler terms, to provide an Arabic [Unicode] Abiword in Windows. I mean, my testing language was Arabic, thanks to the fact, it is good for the testing of BIDI text.My mentor dom, the administrator hub, cross-building robsta, the MSVC abicollab and Abiword releases uwog, Windows maintainer sum1, ryanp with documentation and plugins, and msevior... I have to thank these developers for the support they were providing to me and the other participants. A wonderful job, they are doing, at Abiword. By providing a friendly, yet challenging learning environment, Abiword enables us extending our commitment to Abiword.
The Abiword development will not complete with this GSoC - All the successful Abiword students will continue being Abiword Commitors. The next mile stone release 2.8 is the major focus now. Later the SoC codes will be merged from trunk and further modifications will be done. Finally before the merge to the trunk, a merge from trunk to the branch will happen once more to our SoC branches.
It should however be noted, we all prefer not Windows, but FOSS Operating systems (or more specifically, Linux). As a 'cute' word processor, Abiword runs everywhere, literally. So we still feel free to develop for Windows and Mac, when it comes for Abiword. FOSS will rule the world.
"Abiword -- It's cute."
Monday, April 13, 2009
Transliteration ~ Google and more ..
Google Transliterator
Hindu Arabic Numbers : 0 1 2 3 4 5 6 7 8 9
Hindi : ० १ २ ३ ४ ५ ६ ७ ८ ९
Kannada : ೦ ೧ ೨ ೩ ೪ ೫ ೬ ೭ ೮ ೯
Malayalam : ൦ ൧ ൨ ൩ ൪ ൫ ൬ ൭ ൮ ൯
Tamil : ௦ ௧ ௨ ௩ ௪ ௫ ௬ ௭ ௮ ௯
Telugu : ౦ ౧ ౨ ౩ ౪ ౫ ౬ ౭ ౮ ౯
Update as on 2010 March:
UCSC Unicode Real Time Font Conversion Utility
You may have already used Google's Indic Transliterator to type in the languages [Hindi, Kannada, Malayalam, Tamil, Telugu] using English characters. Transliteration tools, apart from providing the ability to type in Unicode, gives one more advantage. That is, one who can speak, yet can't write in a language can easily type in these languages by using the equivalent characters in English. While transliterating, suggestions are also provided so that we can choose one of them, in case of the confusion.
Google Transliterator can even convert the numbers typed in the Standard Hindu Arabic Numeral System to the local numeric systems specific to those language communities. This shows that Google Indic Transliterator is not just a transliterating utility.
Hindu Arabic Numbers : 0 1 2 3 4 5 6 7 8 9
Hindi : ० १ २ ३ ४ ५ ६ ७ ८ ९
Kannada : ೦ ೧ ೨ ೩ ೪ ೫ ೬ ೭ ೮ ೯
Malayalam : ൦ ൧ ൨ ൩ ൪ ൫ ൬ ൭ ൮ ൯
Tamil : ௦ ௧ ௨ ௩ ௪ ௫ ௬ ௭ ௮ ௯
Telugu : ౦ ౧ ౨ ౩ ౪ ౫ ౬ ౭ ౮ ౯
Update as on 2010 March:
A recent visit to the Google transliterator showed me that now transliteration is possible even for languages not from Indic language family as well, including Arabic, Russian and Amharic (Ethiopian). Hence it should be noted that Google transliterator is no more a mere Indic transliterator. With more features, Google's transliterator stands as a standard online rich text editor at the moment.
UCSC Unicode Real Time Font Conversion Utility
Similar researches are done at University of Colombo School of Computing, Sri Lanka, and a Unicode Real Time Font Conversion Utility is being built. It provides us ways of typing in Sri Lankan languages Sinhala and Tamil, in Unicode. Apart from transliteration, it can also convert the non-unicode Sinhala/Tamil fonts that are mostly used in word processing into unicode, thus providing easy way to convert the stuff that were earlier typed in non-unicode fonts into the unicode representation.
Subscribe to:
Comments (Atom)
