Sunday, August 30, 2015

EMDC Summer Event 2015


I represented EMDC 2012 batch at the EMDC Summer Event 2015, in Fejan island, Sweden. The event was targeted towards my juniors, EMDC batch 2014. I presented the thoughts from our batch, as well as my research work at the event.



Saturday, August 22, 2015

Differentiated Pricing and the Evil in it

The bill from the repair guy, who charged 200%.
You may have heard the popular finding - you should clear the browser cache before attempting to book flight tickets online. This is because differentiated pricing. The seller of the product or service increases the price based on the probability of the buyer to actually buy, for a given price. It is in fact not just online. It has been around for a long time, and those who are not into bargaining (i.e. innocent people like me), are often the victims in this bargain-centric world. Why do we fail in this, when the seller in many of the cases can be just a vendor who does not even have the education and experience we think we have? While we think we are polite and nice by agreeing to the first price in such a negotiation-based scheme, the seller just assumes we are just plain stupid, while he/she is being a smart ass.

Yesterday, we met were met with a similar situation, where a Portuguese "RL- ELECTRICIDADE E CANALIZAÇÃO" charged 43.80 Euro, where he should have just charged just 22 Euro! That means, the double price. He is Rui Pedro B. Lopes, with the phone number 915 282 877. Do not  call him, specially if you are a foreigner. It was just fixing a bolt in a bed, and changing two bulbs. He charged 37.50 Euro as the service fee, where the other shop near by said they charge only maximum 25 Euro per hour. He did not even have to spend an hour, and he claimed, the higher price is because we called his "24 hours hotline". :P Come on, that was noon, and in the site he has listed as "preços reduzidos" (reduced price); not some VIP 24 hour service. Moreover, he charged 6.30 Euro for a single bulb, which we found out in the near by shop later that it is just 1 Euro!Worse is, the bulb stopped working last night (that means, within just a few hours).

Above is the bill he produced, when we asked for a bill. See how he has included some random "technical jargon" instead of being concise, and breaking down the price as "Service Fee - 37.50 Euro; A regular electric bulb - 6.30 Euro". He was also saying "You must fix the entire bed" He really said that attempting to make us spend more. I should note the observation that he resort to do a differential pricing, judging from the look that we are innocent foreigners with not much clue on the local pricing.

Coming back to the core of the blog post from this specific case that induced this long post. I remember, a video I watched recently (See the video from the 0:22 to come to the point). Here you see how the vendor increases the price upon seeing a foreigner. This one is a scripted video for entertainment. Nevertheless I have observed this in China and India. Now also in Portugal. Also I have observed this in my country (Sri Lanka) for foreigners. Sometimes this is very obvious in my country where the entry tickets for parks and museums tend to be 100 times more expensive than that is for the locals. This is something approved by the government and legal. When I think about it, I hate this differentiated pricing, whether it is used against me as a foreigner in another country, or used against a traveller in my own country.


Watch the video from 0:22

This system works under the assumption that foreigners are vulnerable and are often willing to pay more due to the lack of knowledge of the system and language. Even inside the same country, I have noticed this when I speak in English vs Sinhala, the local language. If you speak the local language, chances are high that you will be in a better position though you are a foreigner. In India, I learned the hard way that you must do some bargaining to not end up paying 10 times the price for the three-wheeler drivers.

Belgrade in the early morning
Sometimes the exchange rates confuse a bit as well. When I went to Belgrade, Serbia, immediately after getting down from the bus, I went to a cafe and ordered a cup of coffee and a chocolate croissant. The cashier said 300 Dinar. The cafe did not look like a decent one, and surely the cashier lady was an immigrant. Not that I am judging the immigrants - but Belgrade in the early morning was full of homeless people sleeping in the public parks and train station entrance. Majority of them being immigrants. Not sure how will they be integrated into the society to provide a productive input to the system. I was thinking how much will be 300 Dinar in Euro! I knew for a fact that 1 Euro equals 7.55 Croatian Kuna, as I lived there for more than a week by then. So I was worried whether the cashier lady is attempting to cheat me. I asked for a receipt. 

Obviously the lady pretended not to understand me, regardless of my body language and pointing to the computer/calculator that prints the bills. Eventually she gave up and printed me the receipt. It was 285 Dinar. So I paid the 285 Dinar.  I could not understand a thing as it was printed in Cyrillic scripts though. If it was in Latin script, I would be able to understand as many words such as coffee/cafe are same among many European languages. Serbian is written in both Cyrillic and Latin scripts, while Croatian is mostly written in Latin script, based on my observation. Interestingly both of them are same language technically, while politically they are two different languages.

Still I was a bit worried how much would that be in Euro. The coffee tasted awful - probably the worst coffee in my life. I left the cafe shortly. I started walking towards the meeting point, where my friend would meet me. On the way, I showed the flashing exchange rates. 1 Euro = 119 Serbian Dinar. Then I did a quick conversion to find out that I spent just 2.40 Euro. Not bad. Seems I was not cheated this time (she obviously just tried to charge me 15 Dinar extra, that would be around 0.13 Euro).

I learn new things with each travels. I meet many kind people along the way. I also meet many opportunists who attempt to earn more from the foreigners and unsuspecting victims. It can be online, it can be in the streets. I dislike the bargaining and differentiated pricing. However, that is how "the system" works.

Tuesday, August 18, 2015

Error messages do lie..

Probably this is a trivial error. But when I was trying to deploy a file from the local file system to Hive, it was always giving the below error.

It was clear to me that the files indeed exists in the location /home/ec2-user/datacafe/. However, finally I realized that this is because that Hive client was unable to look beyond the hadoop directory. That means, when I moved the files to /home/hadoop/datacafe, and pointed to them, Hive easily found them! Here /home/hadoop was the installation directory. So this is rather a simple permission issue than a more complicated issue I expected it to be.


21:19:09.380 [main] INFO  edu.emory.bmi.datacafe.hdfs.HiveConnector - Successfully written the output to the file, patients.csv
21:19:09.874 [main] ERROR edu.emory.bmi.datacafe.hdfs.HiveConnector - SQL Exception in writing to Hive Table: patients
org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException Line 1:23 Invalid path ''../ec2-user/datacafe/patients.csv'': No files matching path file:/home/ec2-user/datacafe/patients.csv
    at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:231) ~[hive-jdbc-1.0.0.jar:1.0.0]
    at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:217) ~[hive-jdbc-1.0.0.jar:1.0.0]
    at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254) ~[hive-jdbc-1.0.0.jar:1.0.0]
    at edu.emory.bmi.datacafe.hdfs.HiveConnector.writeToHive(HiveConnector.java:119) ~[datacafe-server-1.0-SNAPSHOT.jar:?]
    at edu.emory.bmi.datacafe.hdfs.HiveConnector.writeToWarehouse(HiveConnector.java:60) [datacafe-server-1.0-SNAPSHOT.jar:?]
    at edu.emory.bmi.datacafe.hdfs.HiveConnector.writeDataSourcesToWarehouse(HiveConnector.java:50) [datacafe-server-1.0-SNAPSHOT.jar:?]
    at edu.emory.bmi.datacafe.impl.main.Initiator.initiate(Initiator.java:71) [datacafe-server-1.0-SNAPSHOT.jar:?]
    at edu.emory.bmi.datacafe.impl.main.Initiator.main(Initiator.java:39) [datacafe-server-1.0-SNAPSHOT.jar:?]
21:19:09.881 [main] INFO  edu.emory.bmi.datacafe.hdfs.HiveConnector - Successfully written the output to the file, slices.csv
21:19:10.097 [main] ERROR edu.emory.bmi.datacafe.hdfs.HiveConnector - SQL Exception in writing to Hive Table: slices
org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException Line 1:23 Invalid path ''../ec2-user/datacafe/slices.csv'': No files matching path file:/home/ec2-user/datacafe/slices.csv
    at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:231) ~[hive-jdbc-1.0.0.jar:1.0.0]
    at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:217) ~[hive-jdbc-1.0.0.jar:1.0.0]
    at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254) ~[hive-jdbc-1.0.0.jar:1.0.0]
    at edu.emory.bmi.datacafe.hdfs.HiveConnector.writeToHive(HiveConnector.java:119) ~[datacafe-server-1.0-SNAPSHOT.jar:?]
    at edu.emory.bmi.datacafe.hdfs.HiveConnector.writeToWarehouse(HiveConnector.java:60) [datacafe-server-1.0-SNAPSHOT.jar:?]
    at edu.emory.bmi.datacafe.hdfs.HiveConnector.writeDataSourcesToWarehouse(HiveConnector.java:50) [datacafe-server-1.0-SNAPSHOT.jar:?]
    at edu.emory.bmi.datacafe.impl.main.Initiator.initiate(Initiator.java:71) [datacafe-server-1.0-SNAPSHOT.jar:?]
    at edu.emory.bmi.datacafe.impl.main.Initiator.main(Initiator.java:39) [datacafe-server-1.0-SNAPSHOT.jar:?]

Sunday, August 16, 2015

Hive on Amazon EMR

Hive on Amazon EMR-4.0.0 is pretty handy, as EMR comes with Hadoop 2.6.0, Hive 1.0.0, Pig 0.14.0, and optionally other tools pre-configured.

Connection can be established to EMR Hive by,

con = DriverManager.getConnection("jdbc:hive2://ec2XXXX.compute-1.amazonaws.com:10000/default","hadoop", "");


The below error is common, and due to the mismatch of the version of Hive Client and Hive server. Note that Hive Server in EMR is 1.0.0. Make sure to have your client same version. Not a later version to avoid the below error.

17:59:22.298 [main] ERROR edu.emory.bmi.datacafe.hdfs.HiveConnector - SQL Exception in writing to Hive Table: patients java.sql.SQLException: Could not establish connection to jdbc:hive2://ec2-54-82-17-142.compute-1.amazonaws.com:10000/default: Required field 'client_protocol' is unset! Struct:TOpenSessionReq(client_protocol:null, configuration:{use:database=default}) at org.apache.hive.jdbc.HiveConnection.openSession(HiveConnection.java:594) ~[hive-jdbc-1.2.1.jar:1.2.1] at org.apache.hive.jdbc.HiveConnection.(HiveConnection.java:192) ~[hive-jdbc-1.2.1.jar:1.2.1] at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105) ~[hive-jdbc-1.2.1.jar:1.2.1] at java.sql.DriverManager.getConnection(DriverManager.java:571) ~[?:1.7.0_40] at java.sql.DriverManager.getConnection(DriverManager.java:215) ~[?:1.7.0_40] at edu.emory.bmi.datacafe.hdfs.HiveConnector.writeToHive(HiveConnector.java:68) ~[datacafe-server-1.0-SNAPSHOT.jar:?] at edu.emory.bmi.datacafe.hdfs.HiveConnector.writeToWarehouse(HiveConnector.java:88) [datacafe-server-1.0-SNAPSHOT.jar:?] at edu.emory.bmi.datacafe.hdfs.HiveConnector.writeDataSourcesToWarehouse(HiveConnector.java:50) [datacafe-server-1.0-SNAPSHOT.jar:?] at edu.emory.bmi.datacafe.impl.main.Initiator.initiate(Initiator.java:71) [datacafe-server-1.0-SNAPSHOT.jar:?] at edu.emory.bmi.datacafe.impl.main.Initiator.main(Initiator.java:39) [datacafe-server-1.0-SNAPSHOT.jar:?] Caused by: org.apache.thrift.TApplicationException: Required field 'client_protocol' is unset! Struct:TOpenSessionReq(client_protocol:null, configuration:{use:database=default}) at org.apache.thrift.TApplicationException.read(TApplicationException.java:111) ~[libthrift-0.9.2.jar:0.9.2] at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71) ~[libthrift-0.9.2.jar:0.9.2] at org.apache.hive.service.cli.thrift.TCLIService$Client.recv_OpenSession(TCLIService.java:156) ~[hive-service-1.2.1.jar:1.2.1] at org.apache.hive.service.cli.thrift.TCLIService$Client.OpenSession(TCLIService.java:143) ~[hive-service-1.2.1.jar:1.2.1] at org.apache.hive.jdbc.HiveConnection.openSession(HiveConnection.java:583) ~[hive-jdbc-1.2.1.jar:1.2.1] ... 9 more 17:59:22.318 [main] INFO edu.emory.bmi.datacafe.hdfs.HiveConnector - Successfully written the output to the file, slices 17:59:22.331 [main] ERROR edu.emory.bmi.datacafe.hdfs.HiveConnector - SQL Exception in writing to Hive Table: slices java.sql.SQLException: Could not establish connection to jdbc:hive2://ec2-54-82-17-142.compute-1.amazonaws.com:10000/default: Required field 'client_protocol' is unset! Struct:TOpenSessionReq(client_protocol:null, configuration:{use:database=default}) at org.apache.hive.jdbc.HiveConnection.openSession(HiveConnection.java:594) ~[hive-jdbc-1.2.1.jar:1.2.1] at org.apache.hive.jdbc.HiveConnection.(HiveConnection.java:192) ~[hive-jdbc-1.2.1.jar:1.2.1] at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105) ~[hive-jdbc-1.2.1.jar:1.2.1] at java.sql.DriverManager.getConnection(DriverManager.java:571) ~[?:1.7.0_40] at java.sql.DriverManager.getConnection(DriverManager.java:215) ~[?:1.7.0_40] at edu.emory.bmi.datacafe.hdfs.HiveConnector.writeToHive(HiveConnector.java:68) ~[datacafe-server-1.0-SNAPSHOT.jar:?] at edu.emory.bmi.datacafe.hdfs.HiveConnector.writeToWarehouse(HiveConnector.java:88) [datacafe-server-1.0-SNAPSHOT.jar:?] at edu.emory.bmi.datacafe.hdfs.HiveConnector.writeDataSourcesToWarehouse(HiveConnector.java:50) [datacafe-server-1.0-SNAPSHOT.jar:?] at edu.emory.bmi.datacafe.impl.main.Initiator.initiate(Initiator.java:71) [datacafe-server-1.0-SNAPSHOT.jar:?] at edu.emory.bmi.datacafe.impl.main.Initiator.main(Initiator.java:39) [datacafe-server-1.0-SNAPSHOT.jar:?] Caused by: org.apache.thrift.TApplicationException: Required field 'client_protocol' is unset! Struct:TOpenSessionReq(client_protocol:null, configuration:{use:database=default}) at org.apache.thrift.TApplicationException.read(TApplicationException.java:111) ~[libthrift-0.9.2.jar:0.9.2] at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71) ~[libthrift-0.9.2.jar:0.9.2] at org.apache.hive.service.cli.thrift.TCLIService$Client.recv_OpenSession(TCLIService.java:156) ~[hive-service-1.2.1.jar:1.2.1] at org.apache.hive.service.cli.thrift.TCLIService$Client.OpenSession(TCLIService.java:143) ~[hive-service-1.2.1.jar:1.2.1] at org.apache.hive.jdbc.HiveConnection.openSession(HiveConnection.java:583) ~[hive-jdbc-1.2.1.jar:1.2.1] ... 9 more

Thursday, August 13, 2015

[KDD 2015] MEDIator: A Data Sharing Synchronization Platform for Heterogeneous Medical Image Archives


One of my recent papers, "MEDIator: A Data Sharing Synchronization Platform for Heterogeneous Medical Image Archives" was presented at a KDD workshop this week in Sydney.

I could not present the paper myself at KDD, as my Australian visa was delayed. Luckily, my friend Denis who lives in Sydney helped me by presenting the paper. Much thanks to him!


Abstract
With the growing adaptation of pervasive computing into medical domain and increasingly open access to data, metadata stored in medical image archives and legacy data stores is shared and synchronized across multiple devices of data consumers. While many medical image sources provide APIs for public access, an architecture that orchestrates an effective sharing and synchronization of metadata across multiple users, from different storage media and data sources, is still lacking. This paper presents MEDIator, a data sharing and synchronization middleware platform for heterogeneous medical image archives. MEDIator allows sharing pointers to medical data efficiently, while letting the consumers manipulate the pointers without modifying the raw medical data. MEDIator has been implemented for multiple data sources, including Amazon S3, The Cancer Imaging Archive (TCIA), caMicroscope , and metadata from CSV files for cancer images.

Saturday, August 8, 2015

Information flow..

Poster on DPRK in Ljubljana.
So I happened to check my ClustrMaps recently, and it was nice to see some dots/visitors from countries including Korea, Democratic People's Republic of (KP) and Holy See (Vatican City State) (VA). While there is a possibility that there can be a misrepresentation due to the use of proxies such as hidemyass.com, I believe it is not the case for the majority. 

It is amazing to notice how the information flows faster than our own selves. I have even learned the names of a few of the countries just from the blog visitors' list shown in the Blogger admin panel and ClustrMaps. For example, countries such as Bahamas (BS) and Belize (BZ). While I would like to visit all the countries eventually, I am genuinely happy that at least some of my thoughts have reached almost all the countries in the form of the blog posts.

North Korea is surely a country in my list to visit. When I was in Ljubljana, I noticed an advertisement poster for a photography exhibition with photos from North Korea. Again the information flow is amazing, to have reached Europe from one of the far Asian countries, that is not much connected to the outside.

Recently the majority of the comments my blog posts received were just spam comments, often with supportive sentences (such as "Nice post; thanks for sharing") followed by an array of unrelated spam links. I hope one day the comments will be more informative and involving, to create a constructive communication over the topic. At least I prefer a meaningful communication, instead of spamming with unrelated links everywhere on the Internet.

Saturday, August 1, 2015

Messaging4Transport at OpenDaylight Summit 2015

I presented the Messaging4Transport (Message Oriented Middleware Bindings for MD-SAL) OpenDaylight incubation project at OpenDaylight Summit 2015 on the 31st of July in Santa Clara. The video recordings of the talk is given below. The updated presentation can be found here.

OpenDaylight Summit 2015 and Santa Clara

Love this name tag from the OpenDaylight Summit
~~ Ready to return home after a very productive conference and unconference, in a few hours. Now it is time to get back to my regular life for a while, and focus on my projects.

This year was full of travels for me, with 11 (short and long) international travels so far (and 2 more upcoming). The best part of this trip was to meet many interesting people, listen to many awesome presentations, and present my project. Unlike the previous trips, I did not do any random walks this time. But will continue in the upcoming trips. :3 I am unpredictable in my travelling habits anyway. I just don't have any pattern. :D

Next year seems more interesting as well, with an array of travels and "migrations" already planned - Shenzhen 2016, Atlanta 2016, and Louvain 2016. Sometimes I suspect that I am becoming a gypsy. lol.