Saturday, December 31, 2011

Google Code-In


The reach of Google Code-In (GCI) is relatively lower in Sri Lanka (probably in other countries too), than its sister program Google Summer of Code. One major reason can be, the age group of 13 - 17 (pre-university students) is not much into programming, over generally into computers, like the university students of 18+ do, in third world countries like Sri Lanka. However, more importantly, the word is yet to be spread, regarding GCI, I feel.

This presentation resembles the Google Summer of Code presentation, I prepared for AbiWord/GSoC, in its style. This is based on my experience as a mentor for Haiku in Google Code-In 2011. Hence the examples used in this presentation are mostly from Haiku. I hope, this presentation will be useful to any student as an introduction to Google Code-In.

Special thanks to Krasimir Petkov, GCI mentor (Haiku) for his valuable input at several times, in shaping this presentation up.

An updated, GCI-2012 presentation is available here.

Friday, December 30, 2011

Why you give ma address to othersz?

When ticking those "Subscribe" buttons, I didn't think much. But suffering now, when it is even impossible to count the number of mail filters.

Some newsletters have obviously given my address to other third-parties too. May be with or without asking me, during the sign-up. However, thanks to the CAN-SPAM Act, all is well, and I was able to effectively get rid of these newsletters by unsubscribing from them. Good bye stupid newsletters - I am removing *many* of you today.. Thanks for messing my gmail inbox with all your marketing spam.. :)


P.S: The cat photo was taken from a bulk email, which holds no indication of the photographer or owner. Caption (y u censor me?) is added by me. Photo credits should go to the photographer.

Sunday, December 25, 2011

Open Source Evangelization and the evolution of the GSoC introductory presentation

Open Source Evangelization
We decided to use Google Summer of Code, as a mean to evangelize open source among the university students. I have been contributing to this evangelization effort this year. On this, I have prepared a presentation and presented at the [1] Institute of Engineers, Sri Lanka (IESL), [2] the Science Faculty of the University of Peradeniya, and [4] the Engineering Faculty of the University of Peradeniya. We have also scheduled a session at [5] the Science Faculty, the University of Jaffna, the 7th of January, 2012.

The evolution of the presentation can be found at the respective blog posts.
[1] GSoC-2011 [35 slides / IESL. First version - 40 minutes]
[2] GSoC 2011 and FOSS [38 slides / SET - UoP - 75 minutes]
[3] AbiWord and Google Summer of Code - 2011 [35 slides / AbiWord Specific - 75 minutes (estimated)]
[4] Google Summer of Code awareness session [42 slides / E-Fac - UoP - 75 minutes]
[5] Google Summer of Code 2012 [45 Slides / Further improved. Final / Global version - 75 minutes (estimated)]

It is interesting to find the growth of the GSoC presentation over the year. Hope my GCI presentation [28 slides / First Version - 50 minutes (estimated)] too will eventually become a presentation with a similar quality.

Saturday, December 24, 2011

The world is a paraboloid..

Finally, the year 2011 comes to an end. 2011 was a year with its own downs and ups. I still feel, 2010 and 2004 were the best of all the years. However, I was able to learn many new things and I strongly believe the impact 2011 made in me is really huge and irreversible. :) In summary, 2011 was a positive year. I am trying to analyze the year through this blog post, with a summary of the remarkable events of the year. 

WSO2
For a software engineer (of course, for others too.. :)), work place plays a major role in his life. The open culture of WSO2 fits me pretty well. I always try my best to contribute to whatever I can. There is an array of webinars we did, and I have presented a few too. You can view them from "Webinars" in this blog. As a member of the WSO2 Stratos team, I have blogged more on Stratos and WSO2 Load Balancer. 

FOSS
I have already done an analysis on the year 2011, when I blogged about my first year completion at WSO2, the 13th of September 2011. Read more on this - one year since...✍.  Similarly. "Open Source Evangelization and the evolution of the GSoC introductory presentation", summarizes the blog posts on the open source presentations I did on GSoC and GCI. This year, I have mentored for both GSoC (AbiWord) and GCI (Haiku). 


Sweetness of 2011
2011 had its own set of interesting days and events. Google Summer of Code Mentor Summit 2011 and  WWW2011India are remarkable.


Llovizna
2011 was a really good year for my blog Llovizna. 2010 was not a good year for my blog, I would say. 2011 was more of an extrapolation of the year 2009. I have blogged about most of my interesting life events through Llovizna, this year. Llovizna became a mixed blog this year - with blog posts touching random topics - not just technology.

how to ignore someone you love is one viral blog post of the year from me. An unreasonably huge delay of publishing one of my articles in a site, made me take a strong decision late this year - that I will publish all the posts through Llovizna only. 

The world is complicated
Hyperbolic Paraboloid [1]
The world is not flat. Rather, it is a paraboloid. I am good in mathematics - but alas, the world is much complicated to be represented by mathematical equations. 

Culture
Culture is NOT something our ancestors had or did at 500 BC. It is NOT something the books say. It is the principles around us today that makes us. Culture is NOT a static entity. It evolves. No individual can harm or destroy the culture. It indeed improves eventually, regardless of the popular belief. Even those who commonly acknowledge that they do not care about their culture too are bound to it. They may just avoid some aspects of it. What they avoid may just be the tip of the culture. Just like the tip of the iceberg. I was able to realize how the country and the culture have shaped us as who we are now, during my visit to the US.

Distributed Computing
I was looking into Data mining and distributed computing this year. Hope to research further on them in the upcoming years. 

The world is a paraboloid
This is one of the concluding blog posts of the year 2011, which happens to be the 100th blog post of the year. Like this blog post, the year too was a complicated mixture with lots of unrelated events packed together. Hope to have 2012 as a faster and more effective year. :) Merry Christmas and a very happy new year, everyone!

☠ ☠ ☠ ~ Whatever happens, keep running. The story ends, when you halt. Do not stop at any time. No time to pause.. ☠ ☠ ☠
[1] http://upload.wikimedia.org/wikipedia/commons/4/4a/HyperbolicParaboloid.png

Saturday, December 17, 2011

Google Summer of Code 2012

We are having a series of GSoC awareness sessions, including the yesterday's session we had at the University of Peradeniya, and the upcoming session at the University of Jaffna on the 7th of January, 2012. These events focus on discussing GSoC and FOSS. Attached herewith is the latest version of the presentation I prepared to introduce GSoC 2012 to the students. Feel free to download and distribute, if the slow network prevents you viewing the presentation here.

As a mentor from the AbiWord community, I have come up with the slides based on our experience with the Google Summer of Code. This presentation is also influenced by my experience as a three time Google Summer of Code participant, with AbiWord (2011 as a mentor and 2009 as a student) and OMII-UK (2010 as a student). Special thanks to Martin Sevior and the AbiWord community for their valuable input at several times, in shaping this presentation up. 

Make sure to have a look at the Google Summer of Code 2012 project ideas from AbiWord.

The presentations in this blog require Shockwave Flash Plugin to display correctly. If you couldn't see it correctly, make sure you have the required plugin enabled. Feel free to drop a comment should you require further information.

Google Summer of Code awareness session

Yesterday we had an awareness session for Google Summer of Code (GSoC) at the Engineering Faculty of the University of Peradeniya. This event focussed on discussing GSoC and FOSS. It is an interesting fact that we have visited the University of Peradeniya, after exactly 11 months, for the very same event - Google Summer of Code awareness session. Our previous session was held at the science faculty, on 17th of Jan, 2011.

Attached herewith is my presentation, introducing GSoC 2012 to the students. This slides are based on my experience as a three time Google Summer of Code participant, with AbiWord (2011 as a mentor and 2009 as a student) and OMII-UK (2010 as a student).

In slow network connections, the presentation might take a bit longer to load. In that case, please feel free to download the presentation for your future reference.


Update: Pls find the latest revised version of this presentation at 

Thursday, December 15, 2011

DZone Kolamba Meetup

We had the first DZone Kolamba Meetup at 4.30 p.m - 6.30 p.m today, at WSO2. The theme was Big Data. Given below is the introductory slides to DZone. Photos of the event can be found here.

Wednesday, December 14, 2011

Configuring WSO2 Load Balancer for Auto Scaling

This post assumes that the reader is familiar at configuring the WSO2 Load Balancer without autoscaling, and has configured the system already with the load balancer. Hence this post focuses on setting up the load balancer with autoscaling. If you are a newbie to setting up WSO2 Servers proxied by WSO2 Load Balancer, please read the blog post, How to setup WSO2 Elastic Load Balancer to configure WSO2 Load Balancer without autoscaling.

autoscaler.xml

The autoscaling configurations are defined from CARBON_HOME/repository/deployment/server/synapse-configs/tasks/autoscaler.xml

1) Task Definition
In WSO2 Load Balancer, the autoscaling algorithm to be used is defined as a Task. ServiceRequestsInFlightEC2Autoscaler is the default class that is used for the autoscaler task.
<task xmlns="http://ws.apache.org/ns/synapse"
      class="org.wso2.carbon.mediator.autoscale.ec2autoscale.ServiceRequestsInFlightEC2Autoscaler"
      name="autoscaler">

2) loadbalancer.xml pointed from autoscaler.xml

This property points to the file loadbalancer.xml for further autoscaler configuration.
    <property name="configuration" value="$system:loadbalancer.xml"/>

3) Trigger Interval

The autoscaling task is triggered based on the trigger interval that is defined in the autoscaler.xml. This is given in seconds.
    <trigger interval="5"/>

Autoscale Mediators

autoscaleIn and autoscaleOut mediators are the mediators involved in autoscaling as we discussed above. As with the other synapse mediators, the autoscaling mediators should be defined in the main sequence of the synapse configuration, if you are going to use autoscaling. Load Balancer-1.0.x comes with these mediators defined at the main sequence, which can be found at $CARBON_HOME/repository/deployment/server/synapse-configs/sequences/main.xml. Hence you will need to modify main.xml, only if you are configuring the load balancer without autoscaling.


autoscaleIn mediator is defined as an in mediator. It gets the configurations from loadbalancer.xml, which is the single file that should be configured for autoscaling, once you have already got a system that is set up for load balancing.
        <autoscaleIn configuration="$system:loadbalancer.xml"/>

Similarly autoscaleOut mediator is defined as an out mediator.
        <autoscaleOut/>

 

loadbalancer.xml

loadbalancer.xml contains the service cluster configurations for the respective services to be load balanced and the load balancer itself. Here the service-awareness of the load balancer makes it possible to manage the load across multiple service clusters. The properties given in loadbalancer.xml is used to provide the required configurations and customizations for autoscaling and load balancing. These configurations can also be taken from the system properties as shown below.

1) Properties common for all the instances

1.1) ec2AccessKey

The property 'ec2AccessKey' is used to provide the EC2 Access Key of the instance.
    <property name="ec2AccessKey" value="${AWS_ACCESS_KEY}"/>

1.2) ec2PrivateKey
The certificate is defined by the properties 'ec2PrivateKey'.
    <property name="ec2PrivateKey" value="${AWS_PRIVATE_KEY}"/>

1.3) sshKey
 The ssh key pair is defined by 'sshKey'.
    <property name="sshKey" value="stratos-1.0.0-keypair"/>

1.4) instanceMgtEPR
'instanceMgtEPR' is the end point reference of the web service that is called for the management of the instances.
    <property name="instanceMgtEPR" value="https://ec2.amazonaws.com/"/>

1.5) disableApiTermination
The 'disableApiTermination' property is set to true by default, and is recommended to leave as it is. This prevents terminating the instances via the AWS API calls.
    <property name="disableApiTermination" value="true"/>

1.6) enableMonitoring
The 'enableMonitoring' property can be turned on, if it is preferred to monitor the instances.
    <property name="enableMonitoring" value="false"/> 

2) Configurations for the load balancer service group

These are defined under
<loadBalancer> .. </loadBalancer>

2.1) securityGroups
The service group that the load balancer belongs to is defined by the property 'securityGroups'. The security group will differ for each of the service that is load balanced as well as the load balancers. Autoscaler uses this property to identify the members of the same cluster.
        <property name="securityGroups" value="stratos-appserver-lb"/>

2.2) instanceType
'instanceType' defines the EC2 instance type of the instance - whether they are m1.small, m1.large, or m1.xlarge (extra large).
        <property name="instanceType" value="m1.large"/>

2.3) instances
The property, 'instances' defines the number of the load balancer instances. Multiple load balancers are used to prevent the single point of failure -  by providing a primary and a secondary load balancer.
        <property name="instances" value="1"/>

2.4) elasticIP
Elastic IP address for the load balancer is defined by the property, 'elasticIP'. We will be able to access the service, by accessing the elastic IP of the load balancer. The load balancer picks the value of the elastic IP from the system property ELASTIC_IP.
        <property name="elasticIP" value="${ELASTIC_IP}"/>

In a public cloud, elastic IPs are public (IPV4) internet addresses, which is a scarce resource. Hence it is recommended to use the elastic IPs only to the load balancer instances that to be exposed to the public, and all the services that are communicated private should be associated to private IP addresses for an efficient use of this resource. Amazon EC2 provides 5 IP addresses by default for each customer, which of course can be increased by sending a request to increase elastic IP address limit.

2.5) availabilityZone
This defines in which availability zone the spawned instances should be.
       <property name="availabilityZone" value="us-east-1c"/>

2.6) payload

The file that is defined by 'payload' is uploaded to the spawned instances. This is often a zip archive, that extracts itself into the spawned instances.
        <property name="payload" value="/mnt/payload.zip"/>

payload.zip contains the necessary files such as the public and private keys, certificates, and the launch-params (file with the launch parameters) to download and start a load balancer instance in the spawned instances.

The launch-params includes the details for the newly spawned instances to function as the other instances. More information on this can be found from the EC2 documentations.

Sample Launch Parameters
Given below is a sample launch-params, that is used in StratosLive by the load balancer of the Application Server service.
AWS_ACCESS_KEY_ID=XXXXXXXXXXXX,AWS_SECRET_ACCESS_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx,
AMI_ID=ami-xxxxxxxxx,ELASTIC_IP=xxx.xx.xxx.xxx,
PRODUCT_MODIFICATIONS_PATH_S3=s3://wso2-stratos-conf-1.5.2/appserver/,
COMMON_MODIFICATIONS_PATH_S3=s3://wso2-stratos-conf-1.5.2/stratos/,
PRODUCT_PATH_S3=s3://wso2-stratos-products-1.5.2,PRODUCT_NAME=wso2stratos-as-1.5.2,
SERVER_NAME=appserver.stratoslive.wso2.com,
HTTP_PORT=9763,HTTPS_PORT=9443,STARTUP_DELAY=0;60

We will look more into these launch-params now.

Credentials
The credentials - access key ID and the secret access key are given to access the aws account.

S3 Locations
The service zip and the common modifications or patches are stored in an S3 bucket. The locations are given by a few properties in the launch-params shown above.
  • PRODUCT_MODIFICATIONS_PATH_S3 - Points to the product specific changes, files, or patches are uploaded to a specific location.
  • COMMON_MODIFICATIONS_PATH_S3 - Points to the patches and changes common to all the servers.
  • PRODUCT_PATH_S3 - Points to the location where the relevant Stratos service zips are available.
  • STARTUP_DELAY - Given in seconds. Provides some time to start the service that is downloaded on the newly spawned instance, such that it will join the service cluster and be available as a new service instance.
Apart from these, PRODUCT_NAME, SERVER_NAME, HTTP_PORT, and HTTPS_PORT for the application are also given.

 

3) Configurations for the application groups

These are defined under
<services> .. </services>

These too should be configured as we configured the properties for the load balancers above.

We define the default values of the properties for all the services under
<defaults> .. </defaults>

Some of these properties - such as the payload, host, and domain - will be specific to a particular service group, and should be defined separately for each of the services, under
<service> .. </service>

Properties applicable to all the instances
payload, availabilityZone, securityGroups, and instanceType are a few properties that are not specific to the application instances. We have already discussed about these properties when setting the load balancer properties above.

Properties specific to the application instances
These properties are specific to the application clusters, and are not applicable to the load balacer instances. We will discuss about these properties now.

3.1) minAppInstances
The property 'minAppInstances' shows the minimum of the application instances that should always be running in the system. By default, the minimum of all the application instances are set to 1, where we may go for a higher value for the services that are of high demand all the time, such that we will have multiple instances all the time serving the higher load.
            <property name="minAppInstances" value="1"/>

3.2) maxAppInstances
'maxAppInstances' defines the upper limit of the application instances. The respective service can scale up till it reaches the number of instances defined here.
            <property name="maxAppInstances" value="5"/>

3.3) queueLengthPerNode
The property 'queueLengthPerNode' provides the maximum length of the message queue per node.
            <property name="queueLengthPerNode" value="400"/>

3.4) roundsToAverage
The property 'roundsToAverage' indicates the number of attempts to be made before the scaling the system up or down. When it comes to scaling down, the algorithm makes sure that it doesn't terminate an instance that is just spawned. This is because the spawned instances are billed for an hour. Hence, even if we don't have much load, it makes sense to wait for a considerable amount (say 58 minutes) of time before terminating the instances.
            <property name="roundsToAverage" value="10"/>

3.5) instancesPerScaleUp
This defines how many instances should be scaled up for each time. By default, this is set to '1', such that a single instance is spawned whenever the system scales. However, this too can be changed such that multiple instances will be spawned each time the system scales up. However it may not be cost-effective to set this to a higher value.
            <property name="instancesPerScaleUp" value="1"/>

3.6) messageExpiryTime
messageExpiryTime defines how long the message can stay without getting expired.
            <property name="messageExpiryTime" value="60000"/>

Properties specific to a particular service group
Properties such as hosts and domain are unique to a particular service group, among all the service groups that are load balanced by the given load balancer. We should note that we can use a single load balancer set up with multiple service groups, such as Application Server, Enterprise Service Bus, Business Process Server, etc.

Here we also define the properties such as payload and availabilityZone, if they differ from the default values provided under
<defaults> .. </defaults>

Hence these properties should be defined under
<service>.. </service>
for each of the services.

3.7) hosts
'hosts' defines the hosts of the service that to be load balanced. These will be used as the access point or url to access the respective service.
Multiple hosts can be defined under
<hosts> .. </hosts>
Given below is a sample hosts configurations for the application server service
            <hosts>
                <host>appserver.cloud-test.wso2.com</host>
                <host>as.cloud-test.wso2.com</host>
            </hosts>

3.8) domain
Like the EC2 autoscaler uses the security groups to identify the service groups, 'domain' is used by the load balancer (ServiceDynamicLoadBalanceEndpoint) to correctly identify the clusters of the load balanced services.
            <domain>wso2.manager.domain</domain>
Once you have configured the load balancer as above, with the product/service instances, you will have the system that dynamically scales.

Auto Scaling with WSO2 Load Balancer

How Auto Scaling works with WSO2 Load Balancer

The autoscaling component comprises of the synapse mediators AutoscaleInMediator and AutoscaleOutMediator and a Synapse Task ServiceRequestsInFlightEC2Autoscaler that functions as the load analyzer task. A system can scale up based on several factors, and hence autoscaling algorithms can easily be written considering the nature of the system. For example, Amazon's Auto Scaler API provides options to scale the system with the system properties such as Load (the timed average of the system load), CPUUtilization (utilization of the cpu at the given instance), or Latency (delay or latency in serving the service requests).

Autoscaler Components

  • AutoscaleIn mediator - Creates a unique token and puts that into a list for each message that is received.
  • AutoscaleOut mediator - Removes the relevant stored token from the list, for each of the response message that is sent.
  • Load Analyzer Task - ServiceRequestsInFlightEC2Autoscaler is the load analyzer task used for the service level autoscaling as the default. It periodically checks the length of the list of messages based on the configuration parameters. Here the messages that are in flight for each of the back end service is tracked by the AutoscaleIn and AutoscaleOut mediators, as we are using the messages in flight algorithm for autoscaling.


ServiceRequestsInFlightEC2Autoscaler implements the execute() of the Synapse Task interface. Here it calls sanityCheck() that does the sanity check and autoscale() that handles the autoscaling.

Sanity Check

sanityCheck() checks the sanity of the load balancers and the services that are load balanced, whether the running application nodes and the load balancer instances meet the minimum number specified in the configurations, and the load balancers are assigned elastic IPs.

nonPrimaryLBSanityCheck() runs once on the primary load balancers and runs time to time on the secondary/non-primary load balancers as the task is executed periodically. nonPrimaryLBSanityCheck() assigns the elastic IP to the instance, if that is not assigned already. Secondary load balancers checks that a primary load balancer is running periodically. This avoids the load balancer being a single point of failure in a load balanced services architecture.

computeRunningAndPendingInstances() computes the number of instances that are running and pending. ServiceRequestsInFlightEC2Autoscaler task computes the running and pending instances for the entire system using a single EC2 API call. This reduces the number of EC2 API calls, as AWS throttles the number of requests you can make in a given time. This method will be used to find whether the running instances meet the minimum number of instances specified for the application nodes and the load balancer instances through the configuration as given in loadbalancer.xml. Instances are launched, if the specified minimum number of instances is not found.

Autoscale

autoscale() handles the autoscaling of the entire system by analyzing the load of each of the domain. This contains the algorithm - RequestsInFlight based autoscaling. If the current average of requests is higher than that can be handled by the current nodes, the system will scale up. If the current average is less than that can be handled by the (current nodes - 1), the system will scale down.

Autoscaling component spawns new instances, and once the relevant services successfully start running in the spawned instances, they will join the respective service cluster. Load Balancer starts forwarding the service calls or the requests to the newly spawned instances, once they joined the service clusters. Similarly, when the load goes down, the autoscaling component terminates the under-utilized service instances, after serving the requests that are already routed to those instances.

StratosLive - A case study for WSO2 Load Balancer

In a cloud environment such as WSO2 StratosLive, auto-scaling becomes a crucial functionality. The system is expected to scale up and down with the dynamically changing load. Auto-scaling capabilities are sometimes provided by the Infrastructure as a Service provider themselves, such as the Autoscaling from Amazon. However, autoscaling is not necessarily a requirement that to be fulfilled by an IaaS. Say, you are providing Platform as a Service (PaaS) that is hosted over the pure native hardware, instead of an IaaS. In that case, your PaaS should be able to provide the required autoscaling and load balancing capabilities to the applications that are hosted on top of your platform. WSO2 Load Balancer is such a software load balancer, that handles the load balancing, fail over, and autoscaling functionalities.

WSO2 Load Balancer is used in production as a dynamic load balancer and autoscaler, as a complete software load balancer product. It is a stripped down version of WSO2 Enterprise Service Bus, containing only the components that are required for load balancing. WSO2 StratosLive can be considered a user scenario with WSO2 Load Balancer in production.


Multiple service groups are proxied by WSO2 Load Balancers. Some of the services have more than one instances to start with, to withstand the higher load. The system automatically scales according to the load that goes high and low. WSO2 Load Balancer is configured such that the permanent or the initial nodes are not terminated when the load goes high. The nodes that are spawned by the load balancer to handle the higher load will be terminated, when the load goes low. Hence, it becomes possible to have different services to run on a single instance, for the instances that are 'permanent', while the spawned instances will have a single carbon server instance.

scp - Copying files between two remote locations

Say, now you are going to copy a few files from a remote server to another. As usual, your remote_server_1 should be given the credentials to copy files to remote_server_2.

root@node2:~# scp -P 1984 -r /mnt/patches root@116.12.92.114:/mnt/patches

Usually, your computer key must already have given the required permissions to access those remote locations. But since the access is not given to remote_server_1, it will prompt for the password of remote_server_2.
As a quick fix, you can copy the private key from your local computer to remote_location_1. However, further discussion on the security concerns on doing this can be found on the web.

scp -P 1984 ~/.ssh/id_rsa root@116.12.92.113:~/

Now if you encounter the below when trying,
root@node2:~# scp -P 1984 -r -i id_rsa /mnt/patches root@116.12.92.114:/mnt/patches
ssh_exchange_identification: Connection closed by remote host
lost connection

Have a look into the denied and accessed hosts of remote_server_2, and make sure that the ip of remote_server_2 is allowed and not denied.

vim /etc/hosts.deny
vim /etc/hosts.allow
#sshd sshd1 sshd2 : ALL : ALLOW
sshd: 116.12.92.113

Now, the scp command given above, should work as expected to copy the files from the remote_server_1 to remote_server_2.

ssh: connect to host xxx.xxx.xxx.xx port 22: Connection refused lost connection

This is one of the commonest errors that are thrown when trying to copy files over scp. The major reason for this is, the port being different from the default 22.
 
pradeeban@pradeeban:~$ scp -r /home/pradeeban/patches root@116.12.92.113:/mnt/patches
ssh: connect to host 116.12.92.113 port 22: Connection refused
lost connection
To fix this, use -P flag, and the port number. Notice the upper case. This is to maintain the consistency with the -p usage of cp command.
pradeeban@pradeeban:~$ scp -P 1984 -r /home/pradeeban/patches root@116.12.92.113:/mnt/patches

Sunday, December 4, 2011

Before you start your localization..

Thursday, December 1, 2011

[Google Code-In 2011] Localizing Haiku

This year, I joined Haiku as a mentor for Google Code-in (GCI) 2011. This is specific to the GCI-2011 task that I have been mentoring for the localization of Haiku operating system. I will post about GCI in a more generic post for the wider audience soon.

Get used to the system
Make sure that you follow the localization guidelines specific to the project. For the Haiku localizations with the Haiku Translation Assistant (HTA), make sure to pick the correct language from the drop-down in the right hand side, under the label "Start Translating in..." If you are going to translate Haiku into Tamil, make sure to pick "Tamil". Also make sure that you have logged into the HTA before starting localization.

For example, if you are translating,

But if you are trying to translate

Join the relevant localization lists to get more information on the localization efforts for the particular project.
[Haiku i18n mail address -  haiku-i18n@freelists.org].

Translate only the strings. Not the notes below.
For example,
in
Pager
Note: A small radio device to receive short text messages
Translate only "Pager". Not the "Note:" below.

When refreshing the page, HTA sometimes tend to reset itself to en_US. Hence make sure that you are not trying to locale en_US (for example, say Tamil - ta).

Wednesday, November 30, 2011

10 Points Before you start your localization..

I am mentoring the localization tasks of Haiku into Tamil for Google Code-In 2011, and hence thought of providing a few suggestions for localizations. Some of these suggestions will be specific to Tamil, while sharing a few common characteristics with other languages.

1) Use the standard terminology
Make sure that you have the necessary reference and the language's latest accepted technical glossary with you. Don't invent your own words or phrases. If you don't know a word, leave it blank, rather than filling it with your guesses.

If you find a word not in the glossary, try to find the meaning from the other reliable sources. If you have found a translation for a word, make sure the translation matches the standard. If an acceptable translation for a phrase is first found, share that with the other team members, and with their approval consider using the word in the translation. Words that are found not in the glossary should be noted down and later can be included in the Glossary.

Systems such as HTA, expect the localizations to be verified by the language maintainer or the mentor, before marking the translations as verified. That is, a translated word can be marked as faulty, by the language mentors. 

2) Be consistent. 
For example, I notice the use of "ஜன்னல்" and "சாளரம்" interchangeably, for the same context. Pls stick to one. In this case, my recommendation is to use "சாளரம்". Don't ignore the existing conventions.

3) Don't use slang or spoken/broken language
Words like "இங்க" and "ஓடுது" are a very slang way of translation, and are grammatically wrong. Please use formal Tamil. Not any spoken variant of Tamil. We will reject the spoken forms of phrases, which are considered wrong in written format.

If something is considered wrong in your Tamil lessons, they are wrong in localization too. We can't get broken or grammatically wrong localizations with wrong spellings into the project. :)

4) Translate as phrases
The phrases should be translated as a whole, and not as word-by-word.

Let's take the phrase, "Update time interval:"
It should be translated as, "மேம்படுத்தல் நேர இடைவெளி" and not "மேம்படுத்தல் நேரம் இடைவெளி". This is something that differentiates the Indic languages from English.

Don't translate word-by-word. Instead, translate by complete phrases. Phrases like, "Add graph" should be translated as a whole in Tamil. Phrases like "சேர்க்கவும் (add) வரைபடம் (graph)" or "வரைபட சேர்க்கவும்" are not grammatically complete, and any native Tamil speaker can point that. It should be "வரைபடத்தைச் சேர்க்கவும்".

"Do you want to stop" should be translated as "நிறுத்த வேண்டுமா?" (want to stop?), instead of "நீ நிறுத்த வேண்டுமா?". Here we omit, "நீ", as that is obvious.


5) Translate for the context.
Some words may have different meanings according to the context. Be careful when localizing them. "Them" may not be "அவர்களை" when it refers to the plural of "it". It should be "அவற்றை".

"written by:" should be "எழுதியவர்:". "எழுதப்பட்டது" doesn't make sense in this context.

Think of,
"written by:Raja"
"எழுதியவர்:ராஜா" will be natural.
"எழுதப்பட்டது ராஜா" doesn't make sense.

So translate for the context. Do not translate as it is.

6) Be respectful to the user
Pls do not use "நீ". Use "நீங்கள்" instead. Similarly, don't use "நிறுத்து". Should be "நிறுத்தவும்". The program should refer to the user in a respective manner. We should not offend the user, by calling him in "singular", as the rule of Tamil.

7) Locales
Be specific to the correct locale. If you are translating for ta-LK, consider the conventions involved, and remember this can be different from ta-IN. Some projects do not have the locales. They just have the country code, ignoring the potential minor changes between the locales.

8) Don't translate the control strings
For example, leave the strings such as,
%lld ms
as it is.
Don't try to introduce blank space between these. Translations such as
% lld நொடி
and
% lld MS
are invalid.
Don't try to introduce blank space between the %lld.
Also, there is no need to transliterate units such as MB, as we use them as standards. Translating it as எம்பி doesn't make sense.

9) Don't just "Google Translate"
For example,
"CPU Usage" should be translated as "CPU பயன்பாடு"
where it has been translated as,
CPU Usage = CPU பயன்பாட்டை by Google Translate.

Google Translate is using a learning algorithm, and is not always correct. Moreover, it is not complete for Indic languages such as Tamil. Please translate on yourself, since we mark those Google Translated phrases as "Faulty", as most of them can be translated using better vocabulary.


10) Easy translations first
There may be a few phrases that you may not be able to translate. Focus on the phrases that you can translate easily first, than struggling with long phrases that may take more time for you to translate.
P.S: This post is an updated version of a post that was written a long time back.


Who is that one person? :)

"One person likes this. Be the first of your friends."
"Who is that one person who *liked* that post - and not in my Facebook list?" :D

Saturday, November 26, 2011

The birth of viral contents over the Internet

Popular Content
For a scientist to become popular, it takes a considerable effort and lots of dedication. But someone who creates some creative content and uploads it over the Internet, might probably get equally famous among a wider audience.

Getting Viral
A content grabs the attention of millions and becomes an Internet meme by becoming viral, shared and spread over multiple online media. The content can be a video, a blog post, an image, or even an audio clip. Some contents become popular due to their controversial nature, and the others become popular just because of the curiosity of the people. The social media interaction makes the popular content more popular. Once a content sparks some interest to a viewer, he might probably visit the content back (say, if that is a video or an audio clip), and also share the content over the social media for the people in his network to view. This leads to an exponential growth to the popularity of the content. If an influential person shares your content to his circles of friends, most probably your content will be viewed and further shared by his circle of friends too.

Creating controversy or inducing curiosity
If we take YouTube, the mostly viewed videos are not necessarily good ones. Most of these video clips have more 'dislikes' than the 'likes', as people get disappointed with what they just saw, because of their curiosity. When the thumbnail image of the video shows some "cute stuff", it is very hard to resist the desire to click and view the clip. A sexy title and an attractive caption will be an added advantage. However, when we realize that there is nothing such interesting material in that clip, than a mere ad, we 'dislike' it. Still the 'view' count increases, and the video remains popular. Some companies work for their clients or customers to make their content viral by creating controversy around them, by posing as multiple users, or simply by sharing that content over multiple media, using multiple accounts.

Sparking the interest
There are a few genuine attempts that become viral by the fans viewing and sharing them multiple times. The most commonly stated example is the YouTube clip, "Yosemitebear Mountain Giant Double Rainbow 1-8-10", where someone shouts and expresses his extreme level of joy, looking at a double rainbow.

The Double Rainbow
It has got 31,595,276 views, 206,997 likes, 4,549 dislikes, and 91,157 comments. This was also made into a song, which also has become equally viral, with almost same number of views and likes. Comics have been written around the "Double Rainbow" and many parodies have been created around. According to an article in knowyourmeme.com, a tweet from Jimmy Kimmel was the major reason behind this video clip becoming popular. However, I am personally not supporting any such claims without a strong evident. Who knows - many others too may have shared the content and enjoyed it parallel.

Why this Kolaveri
Why This Kolaveri Di Full Song Promo Video in HD has got 6,263,365 views, 73,595 likes, 3,058 dislikes, and 30,361 comments within two weeks since it is posted. Like all the other addictions, "Kolaveri" is proven yet another rising addiction. Once watched, everyone keeps watching it multiple times, and then starts sharing. This leads to an exponential popularity growth. If this continues, it will very soon overtake the mostly known viral video - "Double Rainbow" shout. It is a song sung by the Tamil actor Dhanush, a son-in-law of the Tamil super star Rajinikanth. This song is sung in Tanglish, a Chennai slang of Tamil + Broken English, with simple words.

The girl in the green top in this clip is Shruti Hassan the heroine of the movie "3", to which this song belongs to. She is a daughter of Kamalhaasan (an award winning Tamil actor and long time competitor of Rajinikanth). The other girl in this song is Aishwarya Dhanush - Dhanush's wife who directs this movie. The debuting music director, Anirudh, a nephew of Rajinikanth can also be seen in this video. Everyone expected this song to become popular among the Tamil cinema fans due to this stardom. Nevertheless, no one including the producers of this song/movie expected it to become viral globally. The fact that the song is indeed sung in English, but with a south Indian accent and a touch of Tamil, must have helped the song becoming popular among the non-Tamil speakers.

For a content to become viral, it should reach the common men, and should not target a narrow niche. Among my blog posts, how to ignore someone you love can be stated as somewhat viral. It is the third mostly viewed post in my blog, along with the highest number (46) of facebook likes. I, myself didn't expect that post to become popular, since I wrote it without much effort unlike the technology blog posts, that I wrote with much effort. The attractive blog title, with the interesting common area of discussion - "ignoring facebook invitations", must have attracted more readers in, unlike the posts that are focused on a niche.

Creating a viral content is not that much easy though. No one has properly found a formula to estimate how the human brain functions. We can create some interesting content, but the audience decides its success.

Friday, November 25, 2011

A tribute to DZone..

Everyone into the information technology knows that DZone is a good way to find and read the quality articles or blog posts. The recent MVB (Most Valuable Blogger) program is yet another addition to the services provided by DZone, with interesting zones such as Cloud Zone, Architect Zone, and many more. With the success of the concept of the zones, DZone started to introduce many microzones such as HTML5 Zone, DevOps Zone, and a few others.

MVB not just merely re-posts a content as it is. But it formats and makes it better, prior to posting it, if necessary.

The below is an example:

You can see DZone has actually improved the readability of the content by proper styling and syntax highlighting.

I encourage and recommend everyone who takes pride on their technology blogs to become an MVB. Nothing is encouraging than having our thoughts to reach a wider audience. Long live DZone!

Wednesday, November 23, 2011

Auto Scaling with WSO2 Load Balancer


Load Balancer is a crucial component in scalable architectures. WSO2 Load Balancer not only balances the load across the application instances, but also scales the system automatically to cater the dynamically changing load. WSO2 Load Balancer is a WSO2 Carbon based product. In this post, we will look how autoscaling works with the Load Balancer.

WSO2 Load Balancer ensures high availability and scalability in the enterprise systems. WSO2 Load Balancer is used in cloud environments to balance the load across the server instances. An ideal use case of the Load Balancer is WSO2 StratosLive, where the service instances are fronted with the load balancers and the system scales automatically as the service gets more web service calls. Having the Apache Tribes Group management framework, Apache Axis2 Clustering module, Apache Synapse mediation framework, and autoscaling component as the major building blocks, WSO2 Load Balancer becomes a complete software load balancer that functions as an autoscaler and a dynamic load balancer.

Architecture

WSO2 Load Balancer can be configured to function as a load balancer with autoscaling on the supported infrastructure. Currently the autoscaler supports EC2 API. Thus the Load Balancer can be configured as a dynamic load balancer with autoscaling, on Amazon EC2 and the other infrastructures compatible with the EC2 API. The autoscaling component uses ec2-client, a Carbon component that functions as a client for the EC2 API and carries out the infrastructure level functionalities. Spawning/starting a new instance, terminating a running instance, managing the service groups, and mapping the elastic IPs are a few of the infrastructure related functionalities that are handled by the autoscaling component.


The autoscaling component comprises of the synapse mediators AutoscaleInMediator and AutoscaleOutMediator and a Synapse Task ServiceRequestsInFlightEC2Autoscaler that functions as the load analyzer task. A system can scale up based on several factors, and hence autoscaling algorithms can easily be written considering the nature of the system. For example, Amazon's Auto Scaler API provides options to scale the system with the system properties such as Load (the timed average of the system load), CPUUtilization (utilization of the cpu at the given instance), or Latency (delay or latency in serving the service requests).

NEXT >>
Now you know the basics of the WSO2 Load Balancer. You might now want to learn,
1) How Auto Scaling works with WSO2 Load Balancer?

Resources
Blog posts
WSO2 StratosLive - An Enterprise Ready Java PaaS
Summer School 2011 - Platform-as-a-Service: The WSO2 Way

Saturday, November 19, 2011

Time/Money Duality

This post can be considered the part-II of one of my previous posts - LATE. The movie "In Time" was the major motivation behind this post.


Time spent and perceived.
I have heard the phrases "Don't waste money!" and "Don't waste time" more often than any other suggestions asking not to waste something. "Don't waste electricity", "Don't waste water", and other similar suggestions are indeed the derivatives of "Don't waste money", or are driven by some sentiments such as "Don't waste food - give it to the poor instead!" We can simply conclude that "Money" and "Time" are considered two equivalent and most valuable assets. We spend money to save time, and also spend time to save or earn some money. I see this duality as the reason behind the routine of the humans. Everyone makes the world a better  ('better' is a relative term. so someone's better may be another's worse though) place to live, at least by a tiny bit, through their job and otherwise, by investing their time.


Being a complex quantity, Time has its own real and imaginary counterparts. Each of us has 24 hours. But the effective time differs from person to person. I feel, in terms of physics, we can't define Time as a vector or a scalar rigidly. May be, we should research further on the nature of the multi-dimensional time!

If we consider time as a complex number, what we measure will be the time's projection on the x-axis, the real counterpart of the complex quantity. When we are waiting for something or someone even a few minutes go like an hour - we can explain this using the above "Complex-time" concept. 

"Busy" is a relative term. I can be busy for task-1 or person-1, but may be available for task-2 or person-2.

The time and the money spent and the duality
"Can you spend five minutes with me regarding this project?"
"Sorry, I am afraid. I have to catch the train in 5 minutes to my home."
"Oh, it is fine. I am also on the same route. Let's discuss on the train"
Now, I am not busy for the discussion, since the talk is not going to consume my time.

Currently, it is impossible for us to travel by time, or purchase it. So either we spend much time or less of it - we can earn time relatively, but not absolutely. In natural terms, we can't earn time, but just spend it effectively. Time does have a monetary value. The In Time  movie attempts to make Time as money, focusing on the time-money duality. It discusses the sharing of time, and transfer of time between different individuals. The rich have more time, and lesser the poorer.

Since Time is used as the money, the rich have more money, making them living forever, almost, where the poor keep running searching for time, awaiting their end, everyday. For them, "tomorrow is a luxury they (you) can't afford" and even idling becomes costly (of course, idling costs in the real world too, in the time scale, as of the above image).

Wish if we can buy some time, utilizing this time/money duality in the future.

Friday, November 18, 2011

3 Most Annoying Status Updates in Facebook!

Expressing love for their dad!
1) Tribute over the Facebook.
"May I ask a personal favor, only some of you will do it (and I know who you are). If you know someone who has fought cancer and passed away, or someone who is still fighting, please add this to your status for 1 hour as a mark of respect and remembrance, I hope I'm right about the people who will.. Let's save the world from cancer by posting.. ♥"

Come on! You are NOT contributing anything by posting/re-posting that stupid status for 1 hour. Just annoying the users over the Facebook. To make it worse, somehow these posts tend to become famous and spread like Viruses. You can even notice the 50+ likes, and believe me. These guys have never shown love to the cancer patients outside Facebook!


2) Stupid claims 97%
"All of us have a thousand wishes. To be thinner, have more money, a new phone. A cancer patient only has one wish, to kick the cancer . I know that 97% of you won't post this as your status, but my friends will be the 3% that do. In honor of someone who died, or is FIGHTING cancer, post this for at least one hour." 

"NO.. I have heard about many cancer patients who had greater wishes.. for their family or even for the country.. The above status is just an insult to cancer patients.. ewwww... :("

3) I love my brothers (over the facebook)!
<- When some one starts to love his/her parents, siblings, spouses, friends, etc *using* facebook.

People are such sweet hearts (by caring everyone and love them by facebook (errr...) and posting statuses that make no sense at all!) :P

Sunday, November 13, 2011

Are you allowed to choose your *name*?

Facebook, Google+, and probably many other sites do not allow accounts to be registered under a name that looks "artificial". That means, you can register an account only on a name of a human. The reason given is, they do not want to have people registering *fake* accounts, and welcome only the *real* persons to be in! 

Names such as the below are not allowed or at least challenged.

1) Kathiravelu பிரதீபன் - You can't mix two language scripts, though this one indeed is my name, where the first name is written in English, and the last name in Tamil script.

2) Prad33ban, Pradeeban_, or Pradeeb@n - Numbers or special characters are not allowed in the names. Only the alphabets of a single script are allowed.

3) Pradz or PradROX - Suspicious name, most probably a fake!

4) Rock Buddy, Superman, Monkey Gurl, or Fen0023 - Doesn't look real.

5) Pearl Kitty or Brownie - Kitties and puppies aren't allowed!

6) K.K, PDN, pra, or pdn - Initials, pen names, or pseudonyms aren't allowed.

7) Dr.Vijay - Salutations aren't allowed.

8) Sucker - Sounds offensive.

9) kaThiRaVeLu PrADeeBaN - Improper capitalization.

10) Fxx or Bot - Bots are not allowed.

11) Double Rainbow, Firefly, or Stone - Natural objects and insects! No!

12) Colombo Library, Llovizna&Sons, or OpenGroupForum - Libraries, Companies, or organizations aren't allowed to create a profile. Use a 'group' or a 'page' instead.

This regulation has gained severe opposition from those who prefer to have their online identity to be hidden or are interested in having a second life. Some of them choose to have a second life an independent one from their real, offline, or *first* life. They have a valid reason to have a different online identity, I feel. Whatever the name, let it sound like a bot, library, or a kitten, still there is a human behind the name. What matters the most is, no one's privacy should not be violated.

However these rules do not prevent the fake profile creators anyway. They just create fake profiles under real names. People are getting more into deeper fake stuff - SCIgen is an example, which allow you to generate fake papers, mostly used for a good motive of course - to identify the fake conferences by getting the fake papers auto-generated accepted.

Copyrights - "Safely ignored"s in the Internet

I recently found an exact copy of one of my blog posts in another blog, without any credit or pointer to my blog. I tried to comment on his blog post with the link to the original post. He never approved my comment. Hence I decided to report his blog to google, for the copyright violation.

In the topmost banner of the blogger blogs, you can see "Report Abuse". I just clicked, and reported the post.

Google replied with,
Thanks for reaching out to us!

We have received your legal request. We receive many such complaints each

day; your message is in our queue, and we'll get to it as quickly as our
workload permits.

Due to the large volume of requests that we experience, please note that

we will only be able to provide you with a response if we determine your
request may be a valid and actionable legal complaint, and we may respond
with questions or requests for clarification.  For more information on
Google's Terms of Service, please visit http://www.google.com/accounts/TOS

We appreciate your patience as we investigate your request.


Regards,

The Google Team

After within a few hours, Google took the page that violated the copyrights down in accordance with DMCA (The Digital Millennium Copyright Act of 1998), and sent me this message.

Hello,

Thanks for reaching out to us.

In accordance with the DMCA, we have completed processing your
infringement complaint and the content in question no longer appears on
the following URL(s):
http://{blog-name}.blogspot.com/
2011/07/{post}*.html
Please let us know if we can assist you further.

Regards,
The Google Team

* I have removed the blog url to avoid harassing the blogger who copied the blog post.

After all, "Llovizna" is not a commercial blog. According to my knowledge, I am not earning anything directly or indirectly out of this. Just a display of a link or indication to the original post would be sufficient.


One of my friends mentioned, why would I report that person's blog post. My friend mentioned that the person who copied my post is indeed helping me by spreading my thoughts to the followers of his blog. 

No, it doesn't work that way. If someone finds his post in web search instead of me, further interaction with the reader will not be possible. If my view point is challenged in the blog post of that person, he will not be in a position to advocate for my thoughts. Most probably he himself would have forgotten the original post that he copied from, leaving the discussion to a nowhere-zone.

In print, everyone takes utmost care about the copyrights. But when it comes online, it is taken for granted that anyone can violate others' copyrights. It is common to see posts that are copied from the web, even in newspapers as "Thanks: The Internet" or "Thank you: The web". No one bothers to give the exact url or the author of the post, which is in fact a bad practice, and offensive just like any other pirated material.

I, however support open knowledge. Some restrictive licenses prevent others from using the content at all, than merely reading and understanding it. Licenses should be open, just like the open source licenses. In "Llovizna" (and wherever online/offline), I made sure to use only the content or images that are in the public domain, or made sure to provide the credentials to the original license holder, whenever I reused others' contents. Many images with supportive licenses, or those are in the public domain can be found in wikipedia or wikimedia. We need more of them.

Sharing is not just a copy-pasting. It should provide value to the original content, while giving the appreciation that it deserves, and engaging with the content.