Saturday, December 31, 2016

A few things that made my 2016 interesting..

Magnificent Vaduz
2016 was a different year compared to the previous. It had a large share of sad moments. This post discusses only the happy moments and the memories that make my 2016 interesting. The order is random and does not indicate the order of significance or importance.

1. A year-end vacation after a long time.
Winter in 3 countries - Switzerland, Austria, and Liechtenstein!

2. New year's eve in Zurich. 
Eating all the street food and high calories! It was also my first time outside Lisboa for the new year's eve, since 2012.

3. Playing with snow in Malbun, Liechtenstein. 
It is after more than 3 years!

4. Summer in the Bahamas.
That was a long weekend in Nassau. Blue beaches - private and public ones. Reminding Maldives 2015.

5. An extended internship at Emory University.
It was a productive one. Walks in Atlanta were remarkable too.

6. ICWS 2016 in San Francisco
My best conference experience so far!

7. Berlin, presenting 3 papers at IC2E 2016
Berlin has become my most favourite city that I haven't lived in yet! My first (and only, so far) doctoral symposium too.

8. CoopIS 2016 in Rhodes
A beautiful Greek island. It was still sunny and pleasant in the autumn.

9. A long walk all over the historic village of Kostrzyn/Küstrin in Poland.
Crossing the border of Germany and Poland!

10. Relaxing weekend in Viseu
A romantic town in Portugal, set in the Christmas mood.

Daily and nightly walks and restaurant crawls.

12. Winning a new laptop.

Feeling lucky! 

13. AMIA-CRI 2016 San Francisco
A podium abstract presentation and a poster presentation. My first poster, and also first time in a medical conference.

14. EMJD-DC Spring Event in Costa da Caparica.
Visiting from Atlanta! A long travel.

15. Finishing touches of Atlanta.
First time selling stuff, and donating things in bulk.

16. Foggy Vaduz castle.
 Liechtenstein is a cute little country!

An early morning train ride and walk.

18. Snowy white trees of Uetliberg, Switzerland.
Cold, but pleasant.

19. Return to Lisboa
An intermediate era waiting for my move to Belgium early next year.

20. Cooking innovations
Thanks to the YDFM (Your Dekalb Farmers Market), which offered variety of food products and vegetables at a cheap price, from all over the world.

21. Birthday in the sky.
Not sure where exactly that was. Possibly in the skies of Russia. Since I was flying eastwards, the 00:00 must have shrunk somewhere in the timezone changes.

Second time to Shenzhen and also my second visit to Splendid China. Good memories.

23. Canon EOS M2.
Taking it everywhere I travel. :)

24. Naming the projects and papers.
It is fun, always.

Once more, after 3 years.

26. Applying for visas, after a long time!
To the US and Belgium. Not that I am a fan of visa applications.

27. Olaias metro station.
This is one of the most beautiful metro stations of Europe. This becomes our daily stop the last 3 months.

28. Feeling the heat of being a "senior PhD student".
I am in the middle of the third year. Means, 5 semesters gone. Only 3 remaining. Not a new student anymore.

29. Attending a number of conferences to present my papers.
Last year, I missed many due to clashes in schedules and visa issues. This year was better in that aspect.

30. Looking at the future
Awaiting an "intermediate year of 2017" with more migrations of PhD mobility, and a "final year of 2018", to graduate!

Thanks for reading my list till the end. If you are really interested, you may also read the blog posts on the previous years as well. Happy new year everyone. Hope 2017 will be more positive and bring us more happy news than 2016. I am also reaching the 30s in 2017. This makes 2016 my last complete year in 20s. Feeling old. :D

New vs. old memories

View from the top, Üetliberg
We were considering spend the year end in Porto, a place that we visited in mid of 2013, after completing the first year of our MSc. It would have refreshed our memories from 3.5 years ago, as we did not visit the place lately. However, we changed the plan at the last moment to spend some time in Zurich and the neighbouring towns. It is a cold winter - yet not much snow remaining in Zurich except for Üetliberg. We hiked Üetliberg in a foggy morning!  Magnificent views. The trip was also a very cleverly planned one. Proud of myself. :P

This was the first new year we spent away from Lisboa. We were in Lisboa for each new year since the 2013 new year. 2017 is different. :) The last time we had a year end trip was in 2013 December (to Helsiki and Copenhagen). However, we had returned to Lisboa for the 2014 new year. This was a year end trip after 3 long years. Good memories.

End the year in walks!

Cloudy Feldkirch!
We completed the year with hikes in 3 cities/towns: i) Zurich, Switzerland, ii) Feldkirch, Austria, and iii) Vaduz, Liechtenstein. A hike in the Feldkirch is a good one, from the city center to the wild park! Give it a try even if you are not into visiting wild animals in parks or zoological gardens.

Friday, December 30, 2016

Liechtenstein by Public Transport

The beautiful Liechtenstein buses
Liechtenstein is one of the cutest countries in the world, which also is happened to be doubly-landlocked. Going there from a far away country needs some planning since it does not have its own airport. Zurich airport is the closest and most convenient. From there, you have at least two options: Either to i) take a train to Feldkirch, Austria and a bus from there to Vaduz, Post, Liechtenstein, or ii) a train to Sargans, Switzerland, and a bus from there to Vaduz, Liechtenstein. Vaduz is the capital of Liechtenstein and the royal family still lives there in the castle.

The Liechtenstein bus 11 run between Sargans and Feldkirch via Schaan (the most populous city of Liechtenstein) and Vaduz. 12E, a faster (lesser stops) bus runs between Vaduz and Sargans too. So you may reach Schaan or Vaduz either from north from Feldkirch or from south from Sargans. The route from Sargans is the shortest and cheapest. The bus traverses the little doubly-landlocked country with magnificent views. So you may do what we did: Go from Feldkirch, and return through Sargans, hence avoiding going through the same route twice.

It is also possible to go to Buchs, Switzerland by train from Zurich and take bus 12 from there to reach Vaduz, Post from the west. In this option, 12 goes till Triesen via Vaduz. So if your destination is Triesen instead of Vaduz, this option will be the best. There may also be connections between the villages of Liechtenstein and other towns of Switzerland and Austria. However, given our option of taking the train from Zurich and having Vaduz as the destination, these 3 (Sargans, Buchs, and Feldkirch) are the only options that I have identified.

The beautiful Malbun!
Sargans -> Zurich train second class costs 33 CHF and 1 hour (euro accepted with a decent conversion rate in the train stations). Zurich -> Feldkirch second class costs 48 CHF and 1 hour 30 minutes. The bus each way costs around 6 CHF. If you would like to go to other villages, you have to get down at either Schaan or Vaduz and take another bus. The buses are frequent - runs once in 30 minutes, and the journey takes around 30 minutes too.

We also went to Malbun, a beautiful ski resort and mountainous village, from Vaduz, by bus! The Liechtenstein buses are frequent and timely. You may plan your trip with the help of Google Maps, which is up to date with the trip times.

In Vaduz, you may hike (or drive or ride a bicycle) up to the castle. The castle itself is not open to public. But you will witness frozen streams in the winter and a bird's eye view to the city. It is an alternating climb between two lanes Haldenweg and Schlossweg. Though this is a bit confusing, you will be just fine, following the crowds or by following the visible castle up there!

The people of Liechtenstein speak German. However, all who interacted with me (including the drivers and restaurant waitresses) spoke perfect English. I must warn that it is an expensive country (just like Zurich)! Worth a visit in winter to the snowy mountains!

Monday, December 12, 2016

A Dirty way to the Social Media Stardom

Twitter has somehow created a class hierarchy with its verified accounts. Anyone can receive a "blue tick" as long as they have met a certain criteria. Doesn't it sound fancy to have 10K followers when you just follow no one except the God? That is what these unethical/fake Twitter stars are doing by exploiting a flow in Twitter's workflow.

First, you will receive a follow from a star/verified account from some motivational speaker, influencer, marketer, leader, or a king of jungle jiggy bamboo. You follow them back because you are a nice person who follow everyone else back (probably with an exception of obvious bots), or because you feel happy to be followed by a "star-Tweep".

One week passes by. By this time, the star-Tweep would have unfollowed you already. You would not notice, as Twitter does not inform or alert you on unfollowers. On the other hand, the star-Tweep, this is how he/she became the star in the first place. You will still receive their noise in your time line. This is scam also common in YouTube. Not possible in Facebook or LinkedIn, as there the connection option is two-way, unlike the Twitter and YouTube follow. It might be the case with Google+ too. But I never used it long enough to know its tricks. I am not sure about other social media (such as Instagram), as I am not a user.

I use TweepsMap these days to find these culprits. Trust me, every week I have around 5 of these guys. Those with "10 K followers and 5 K following", following me first only to unfollow me within a week after I followed them back.

The last offender was someone (a company, judging from their handle) with "Following: 4,975, Followers: 7.8K", who followed me first to unfollow me within a week after I followed back.

Bye for now, Twitter. You used to be a way to announce "eating salad now", and you have grown up to be a mass weapon for marketing, politics, and all the similar things.

Sunday, December 11, 2016

Travels in Portugal

Viseu is ready for Christmas!
I am in my 5th year in Portugal. Finished 2 years of MSc/EMDC, and 2 complete years of PhD/EMJD-DC - and currently in the 3rd year of my PhD with one semester almost complete. I can divide my stay in Portugal into multiple eras:

1) Arrival: The first semester of EMDC (arrival till end of 2012). I was waiting for my grant. Combined with new friends and new country, this was an exciting period.

2) Travels and Fun: The second semester of EMDC (first half of 2013). This was the period I was travelling around and enjoying Portugal. By end of this era, I returned to Sri Lanka, before going to Sweden to continue the 3rd semester of my MSc. I also visited Porto during this time.

3) The return: This is by far the longest period, starting from 2014 new year, my master thesis semester / the final semester of EMDC, and the first 2 years of EMJD-DC, my PhD. This period ended with my departure to the USA for my internship in Emory University, Atlanta.
Portugal was almost like my second home during this era, and I made numerous local and international travels during this time.
  
Among the local travels, remarkably, 

Another random trip to south: (Portimão: 12th - 13th, September 2015)

4) An intermediate era: This marked the end of 2016 and early 2017. A short stay in Portugal following my return from Atlanta and waiting for my departure to Belgium to continue the latter part of my PhD in my second university.

Among the local travels,

It is winter here and enjoying my 5th winter in Portugal.

Thursday, December 8, 2016

Naming the papers/projects

The most interesting part of a paper/project is giving it a short name. So far, I had the privilege to name a few papers with interesting names, during my masters and PhD. Given below is a list of my works. This blogs an on-going work, and is not a complete list.


Adjectives
1. SMART
2. FIRM 
3. CHIEF
A view of Óbidos
 
Portuguese Cities/towns
1. SENDIM
2. Óbidos (on-going work)

Abbreviations 
1. SD-CPS (submitted)

Extensions to known frameworks
1. Cloud2Sim / Cloud²Sim

Names surrounding the core technology
1. SDNSim 
2. xSDN 

Names sounding technical
1. MEDIator
2. MediCurator

Animals or birds
1. Cassowary  

Cute names 
1. ∂u∂u 

Made-up names (almost an abbreviation)
1. ViTeNA

Historical names
1. Mayan
 

Wednesday, November 30, 2016

Worst of LinkedIn: Those who warn of scammers in a scamming way!

So we all know that there are so many scammers in LinkedIn by now. Many of them try to get views or attention to their profile by posing as a recruiter in a city like Dubai or Doha, or by posting silly mathematical questions. The worse is those who fall for this scam, and write down their email address to the public display and to be scrawled by the spam bots. Some just "like" the status, or type "interested" or "pls review my profile", as that is the best way to prove their competent in the posted job.
Worst scammers of LinkedIn

Now the worst kind of scammers: see the above screenshot.

The status, by default, shows only the first 5 lines. For the entire message, you need to press "Show more". Hence, the unsuspecting LinkedIn users give their contact details as they would do for a LinkedIn job posting I discussed before. They would not read and realize that this status was posted as a warning.

I will explain why these guys who attempt to give light to the con recruiters in LinkedIn are the worst. First, if he was genuine, he would have started the message with the warning, such as "Pls do not fall for this type of scams in LinkedIn:", instead of making the warning hide deep below the advertisement. If you go through the comments, almost everyone has given their contact details, with only a very few commenting how others have misinterpreted this gentleman's kind gesture. 

Little did they know that this type of statuses are very common and popular. In fact, they serve the same purpose. Getting more views to your profile. These are worse, as they are made to obviously make fool of those who commented. Besides, this trick can be used by anyone, not just recruiters, hence increasing the potential reach of the scam.

Don't be this guy, who ruins LinkedIn for everyone! I have been unsubscribing from everyone who falls for these shit to keep my LinkedIn sane.

Sunday, November 27, 2016

Viseu - a young and thriving Portuguese city

Palacio do gelo
[26th - 27th, Nov] We had a lazy weekend in Montebelo Viseu & Spa a break - mid of busy schedules of PhD, full of deadlines. Winter seems to be already here. Viseu is ready for the Christmas.

Tuesday, November 1, 2016

ViTeNA: An SDN-Based Virtual Network Embedding Algorithm for Multi-Tenant Data Centers

We have one more of our papers presented today at NCA. As all the authors are currently busy due to other commitments, we have one of colleagues present the paper for us. The presentation slides and the abstract are given below. 

In the mean time, I resumed my research at INESC-ID. It is almost November, and the sun is still up and shining!



Abstract: Data centers offer computational resources with various levels of guaranteed performance to the tenants, through differentiated Service Level Agreements (SLA). Typically, data center and cloud providers do not extend these guarantees to the networking layer. Since communication is carried over a network shared by all the tenants, the performance that a tenant application can achieve is unpredictable and depends on factors often beyond the tenant’s control.


We propose ViTeNA, a Software-Defined Networking-based virtual network embedding algorithm and approach that aims to solve these problems by using the abstraction of virtual networks. Virtual Tenant Networks (VTN) are isolated from each other, offering virtual networks to each of the tenants, with bandwidth guarantees. Deployed along with a scalable OpenFlow controller, ViTeNA allocates virtual tenant networks in a work-conservative system. Preliminary evaluations on data centers with tree and fat-tree topologies indicate that ViTeNA achieves both high consolidation on the allocation of virtual networks and high data center resource utilization. 

Friday, October 28, 2016

Software-Defined Simulations for Continuous Development of Cloud and Data Center Networks

Today I am presenting my full paper at CoopIS. This is one of the core works of my PhD, and something that I have been working for, for the last 2 years. The presentation and the abstract are given below. The full paper can be found here.

Today is also my last day at Rhodes, and back to Portugal tomorrow very early morning! It was an amazing week at this lovely island with full of sessions and networking. I also had another paper in the conference last year as well, which was held in the same location, same days. However, I was unable to physically join the conference due to clash in travel schedules. Luckily, this time I made it!


Abstract: Cloud network systems and applications are tested in simulation and emulation environments prior to physical deployments, at different stages of development. Software-Defined Networking (SDN) enables separating logic and execution from the data plane consisting of switches and hosts, to a logically centralized control plane. The global view and control available to the controller enable incremental updates, management, and allocation of resources to the networks. However, unlike the physical networks or the networks emulated by the emulators, current network simulators still lack integration with the SDN controllers.

Hence, currently it is impossible to efficiently orchestrate a simulated network through a centralized controller, or realistically model the controller algorithms and SDN architectures without having the resources for a one-to-one emulation. To address this, this paper presents SDNSim, an SDN simulation middleware, which leverages the principles of SDN for continuous development of cloud and data center networks. SDNSim is an “SDN-aware” network simulator that integrates with the controller through plugins for southbound protocols such as OpenFlow, to execute the algorithms incrementally thus deployed in the control plane.

Wednesday, October 26, 2016

Selective Redundancy in Network-as-a-Service: Differentiated QoS in Multi-Tenant Clouds

I arrived at the beautiful Greek island of Greece this Sunday to present one of my papers at CoopIS conference and another at EI2N workshop. Today I presented my first paper at EI2N. The presentation and abstract are given below.



Abstract: Data centers consist of various users with multiple roles and differentiated levels of access. Tenant execution flows can be of different priorities based on the role of the tenant and the nature of the process. Traditionally enterprise network optimizations are made at each specific layer, from the physical layer to the application layer. However, a cross-layer optimization of cloud networks would utilize the data available to each of the layers in a more efficient manner.

This paper proposes an approach and architecture for differentiated quality of service (QoS). By employing a selective redundancy in a controlled manner, end-to-end delivery is guaranteed for priority tenant application flows despite congestion. The architecture, in a higher level, focuses on exploiting the global knowledge of the underlying network readily available to the Software-Defined Networking (SDN) controller to cater the requirements of the tenant applications. QoS is guaranteed to the critical tenant flows in multi-tenant clouds by cross-layer enhancements across the network and application layers.

Wednesday, October 12, 2016

Atlanta Finishing Touches and the Return to Portugal

Finally, the day came to return to Portugal. As expected, we were left with a considerable amount of stuff to be thrown away or left behind (?) in Atlanta. We attempted to sell a few. We have two Facebook selling groups in Atlanta - one private to Emory University students and staff. Interestingly many of our stuff created a large interest, and some were even bought by the students promptly.

However, one troll group, belonging to an Emory University fraternity decided to troll on my posts.



It was 4 guys named Max, Alex, Zach, and Kyle trolling together. Apparently I have no time to deal with Facebook trolls. I was in Facebook for very short time to sell the stuff and deactivate the account again. Interestingly, the trolls all share the profile pictures - appearing in each others' profile picture. The group did not have a group administrator, or the administrator is unable to or unwilling to reprimand the trolls. 

The Internet troll and bully culture is very dangerous. I have read in news that similar fraternity groups in the US trolling new students, particularly the female students. In extreme cases, online bullying has lead to suicides. These trolls, invite their victims to "just kill themselves".

Troll towards me is harmless, except the fact that my 'business' failed. :P The trolls belonging to the Emory University's Kappa Sigma #20 fraternity (as retrieved from their Facebook profiles) did not get anything out of this except some childish pleasure.

Luckily, we found a church that accepts the free stuff from neighbourhoods. It was win-win. We did a charity, and felt better. Received the blessings of the church. On the other hand, we indeed wasted a lot of time waiting for those who confirmed to buy the stuff. The students cancelled at the last moment (literally), after confirmed to buy, and we waited for them for the whole day to arrive.

We still had some decent students who indeed came and bought the stuff as discussed. This was my first experience as a seller, and I learned a lot. It was my first time being trolled online too. 

Now I am back to Portugal - my familiar lands with more familiar faces and no trolls around. Summer in Portugal was gone when I am back though. Already cold and rainy. Autumn is here, as I resumed my work at INESC-ID Lisboa.

Saturday, September 24, 2016

Wrong calls and missed calls of Atlanta

I used to get many wrong SMS to my mobile, intended to be sent to Keith. After communicating with those who send them, I realized my number was previously indeed used by someone called Keith. Seems in the US (or at least in GA), they are recycling the mobile number. That means, when I leave, my number will be given to someone else too. I often got missed calls too. After realizing what happened to Keith, his mates stopped messaging or calling my mobile. But worse is yet to be discussed.

There comes the marketing calls to my land line! First they want to sell me security services and auto-insurance. In many cases, they are just bots calling you. Some are in fact scammers. One claimed that they found that my auto-insurance is faulty (I do not even drive!). Another claimed something to do with the US tax authority. They transmit your call to a human, if you answer them patiently. Those humans are far worse than the bots.

The chat goes like "Hello, this is Cathy, how are you doing today?". If you say something like "Hi" or "Hello", the chat proceeds to the next level - immediately to the marketing/sales pitch of the bot. If you say something that cannot be processed by the bot, it will disconnect. Sad part was, the bots called me more frequently than real connections did. :D

One thing I learned was, it is easy to receive calls from these bots. But when you really want to connect to a hospital billing service, it is a night mere. I did this mistake of giving access to two of my debit cards to Piedmont Urgent Care, by WellStreet.

I initially give them my debit card. But later the insurance company sent me a large bill - after paying just 25% or so of the original bill of the hospital. (I learned that the health care in the USA is very expensive. You better not get sick. If you get sick, you better not visit the doctor if at all possible. On top of that, my insurance is a joke - unlike the Swedish one I had in Europe. I should have got a real working insurance). So I asked them to use my HSA savings card to charge me instead of the debit card I previously authorized them to charge, by visiting them again. They accepted my request. What happened next was funny! They charged my debit card despite the request for change well on time and then they also authorized payment for the same amount from the second card (HSA) too, with a pending payment scheduled!

I learned it hard way. Never give access to multiple cards to any vendor. If it is a single card, it has all these security measures in place, mostly. A vendor cannot charge you twice in a single card. The duplicate transaction will usually be failed by the bank. But when you give them access to multiple cards, they of course can charge both your cards, since you are giving away this protection offered by the bank. 

I called the billing service to sort this out, to make sure they are not charging me twice. First, the bot notified me that I must call them Mon - Fri 9 - 5. Then I called them during that time. It starts with this long useless message, "Hi there, Welcome to the WellStreet Urgent Care Services. As our valued customer, your time is very important to us. Listen carefully as most of our options have changed lately. We will be happy to help you.. bla bla bla.. To continue in English, press 1"

Then eventually, once you followed all the shitty things you were expected to do by that moronic phone bot, you get the message that all their customer representatives are busy and cannot attend your call. The bot indicates you to leave your full name, phone number, and details so that some one would get back to you in 24 hours. No one did. Worse, once I somehow managed to get hold of a human after hours of attempt - she disconnected after I mentioned the situation! Probably a honest mistake - she may have mistakenly dropped something, though I suspect that was the case. Eventually, after frustrating days of calling - which I came to realize that there is no customer service exist - I sent them an email to the billing department and the public relations. They immediately cancelled the duplicate payment with an apology.

Finally, email worked better than the phone. Seems they are hiring uneducated and untrained staff for the customer care and billing hotlines. Sometimes these employees are not much better than the bots. Hence coming to my conclusion, here it is easy to get a call from a useless bot. But when you really need to get something done, it is impossible to get hold of a responsible human.

When I was in Sri Lanka, it was easy to assume that the developed countries handle these issues in a better way. When I was in Portugal, it was equally easy to assume that if I am in an English speaking country, the situation would be better. So here I am, in the USA, a developed and English speaking country. Nothing much changed. Bad customer service is everywhere - whether you are in a developed English speaking nation, or a developing alien-language speaking country.

I should of course give credits to awesome customer services offered by many other organizations here - for example, dealing with the GA Power for electricity was always smooth. They have a working web site, and the most helpful social media team ever who is willing to go beyond their duty to give you assistance. So it is all about the teams finally. Not the country actually. Hope for better.

Friday, September 23, 2016

Winning the customers' trust back

A 100$ bonus from Wells Fargo. Any takers?
It was quite noisy here with the Wells Fargo scandal, many asking the CEO to step down. If you are unaware of the news, basically it is a bank in the US, which decided to cheat their customers and stakeholders by faking accounts and credit cards. They reached their business margins at the loss of their customers. In the current world, banks are already earning a lot for just keeping the assets of the customers. Customers trust them. Wells Fargo broke that trust. It needs time to recover. They need a strategy. CEO should of course go, to have overlooked such a mass-scale scam.

On the other hand, I received the above letter. I have lived here in the US for 6 months now, and this is the first time I received a letter from Wells Fargo. I am not sure where they got my contact. But I will assure you - this is not the time to try to acquire more customers. Specially, I am double-suspicious about their 100$ bonus. Too good to be true - from an apparently bogus bank. Pls, first rebuild the trust of your existing customers before trying to get new ones. I am definitely not going to sign up, even if you allure me with 10,000$ bonus!

Query disjoint data bases in parallel and combine, compose, and return the output using Drill

Instead of using MongoDB as a single or a clustered data store, we may partition the data in independent MongoDB instances that are hosted remotely. Then we may use the UNION operator of Drill to join the results accordingly. 

Why do we need to do this? 
i) Because we may already have the data partitioned in different sources.
ii) Due to the domain knowledge, we may do a better job in partitioning the data.
iii) Even in a dumb partitioning, Drill scales and performs well.
iv) There are some interesting research questions, leveraging locality of data to provide better and faster outputs than a clustered or distributed Mongo deployment.

Be warned that Drill has its limitations in data structures that may hurt the performance - for example, nested complex schema such as multi-dimensional arrays. We previously have discussed a work-around for this.
In this post, we will see the simplest example of achieving this.

1. Define the Mongo Storage Plugins
For each of the Mongo Server, define the storage plugin separately in Drill.

Multiple definition of Mongo Storage Plugin, pointing to various Mongo deployments

For example, above mongo3 is defined as below in http://localhost:8047/storage/mongo3

{
  "type": "mongo",
  "connection": "mongodb://184.72.102.246:27017/",
  "enabled": true
}

2. Now query through the query browser:
Querying from the multiple Mongo Deployments and UNION them to the results.



select last_name as id from mongo.employee.empinfo
union all
select first_name as id from mongo2.employee.empinfo
union all
select first_name as id from mongo3.employee.empinfo



Now you may execute this, and get the results. Depending on the nature of the query and partitioning and scale of the data, you may be able to experience performance benefits due to the data partitioning. How do we actually partition the data in each of the MongoDB deployment, with related items co-located in a single partition is a research question, and probably deserves another post.

Monday, September 19, 2016

Drill Integration to Bindaas

Apache Drill has been integrated to Emory BMI's Bindaas Data Server, as a data source provider. The screencast below shows the basic usage of the Drill provider. Please note that the Drill provider is currently experimental and only available in the maven-restructure and maven-restructure-dev branches.

While maven-restructure remains a stable branch following a major restructuring on Bindaas to enhance its usability by the developers, maven-restructure-dev is a branch that is built on top of maven-restructure-dev. Mostly these branches are synced with minor latter developments may only be available at maven-restructure-dev till the merge.

These latter developments will be merged to the master branch eventually. The released versions of Bindaas can be found here.


If your Drill is configured with JPam for authentication, the user of the operating system also functions as the Drill user, as defined in the configurations of your Drill instance.

As Drill driver is based on the JDBC driver, the Drill JDBC url has a similar form. However, user name and password are optional for Drill Provider. If your Drill instance is not configured with JPam, leave the username and password entries blank when you define the data source in the Data Provider Creation step shown in the above screencast.

An example would be,
jdbc:drill:drillbit=localhost:31010 for a Drill configured stand-alone.

P.S: This screencast was captured using gtk-recordmydesktop. It works well on my Ubuntu-16.04. I highly recommend it for your screencasts.

Friday, September 16, 2016

Messing with the data schema to make it work with Drill (without using Drill's additional functions.)

I must warn that this is not practical - you may not have the access or capacity to modify the schema of the data you want to query in the first place. Unless the data bulk was taken as a dump and queried a million times, there is no performance benefit in doing the below attempt. But for research purposes, why not? :)

I had this multi-dimensional array in my data that was impossible to query with Drill due to its complex data schema. Before the readers pointing me out that it is indeed possible to query complex arrays, what I mean is, it is impossible to query with the same performance level, as we need to use flatten function which ruins the performance and output format. On the other hand, knowing the exact indices of arrays is impractical too.

I have this multi-dimensional array that makes it impossible for me to proceed:
"coordinates":[[ [.. , ..] , [.. , ..] , [.. , ..] ]]

Error message was:
Error: SYSTEM ERROR: UnsupportedOperationException: Unsupported type LIST

Fragment 0:0

[Error Id: 5cc520ff-9594-4b9b-998d-20bf8569981b on llovizna:31010] (state=,code=0)


I had to transition the above into the below structure to make it work without any Drill functionality such as flatten.

"coordinates" : [ { "x" : .., "y" : .. }, { "x" : .., "y" : .. }, { "x" : .., "y" : .. } ]

0: jdbc:drill:zk=local> create table dfs.tmp.camic as select * from dfs.`/home/pradeeban/programs/apache-drill-1.6.0/head2.json`;
+-----------+----------------------------+
| Fragment  | Number of records written  |
+-----------+----------------------------+
| 0_0       | 1                          |
+-----------+----------------------------+
1 row selected (1,392 seconds)



What I did essentially was to make the multi-dimensional array into a map.

0: jdbc:drill:zk=local> select * from dfs.tmp.camic;
+-----+------+-----------+---------+---------------+-------------+---+---+------------+------+----------+-----------+------------+------------+-------------+
| _id | type | parent_id | randval | creation_date | object_type | x | y | normalized | bbox | geometry | footprint | properties | provenance | submit_date |
+-----+------+-----------+---------+---------------+-------------+---+---+------------+------+----------+-----------+------------+------------+-------------+
| {"$oid":"56a784647b7b51c562"} | Feature | self | 0.3712421875 | 2026-11-16 01:17:13.101 | nucleus | 0.049646965 | 0.435353796 | true | [0.042729646965,0.85353796,0.8105608,0.7145562075] | {"type":"Polygon","coordinates":[{"x":0.04795445442,"y":0.87641187789917},{"x":0.0427805229,"y":0.87187789917}]} | 17.0 | {"scalar_features":[{"ns":"http://u24.bi.rk.eu/v1","nv":[{"name":"Hty","value":242.50489875},{"name":"ty","value":25.0},{"name":"Hty","value":-12.11},{"name":"ee","value":2.11}]}]} | {"image":{"case_id":"TC-2-00-01-01-T2","subject_id":"TC-02-000"},"analysis":{"execution_id":"ta-test","study_id":"tdma:::tue-jan-6-19:17:13-est-2011","source":"computer","computation":"segmentation"},"data_loader":"1.3"} | 2016-01-16 01:17:13.102 |
+-----+------+-----------+---------+---------------+-------------+---+---+------------+------+----------+-----------+------------+------------+-------------+
1 row selected (1,31 seconds)

Now, this works. :P

Thursday, September 15, 2016

Apache Drill and the lack of support for nested arrays

Apache Drill is very efficient and fast, till you try to use it with huge chunk of one file (such as a few GB) or if you attempt to query a complex data structure with nested data. Now, this is what I am trying to do right now - attempting to query large segments of data with a dynamic structure and nested schema.
 
I may construct a parquet data source from a nested array, as below,  
 
create table dfs.tmp.camic as ( select camic.geometry.coordinates[0][0] as geo_coordinates from dfs.`/home/pradeeban/programs/apache-drill-1.6.0/camic.json` camic);
 
Here I am giving the indices of the array. 
 
Then I can query the data efficiently. For example,  
select * from dfs.tmp.camic;
 
However, giving the indices won't work as I need, as I don't just need the first element. Rather I need the entire elements - in a large and dynamic array, representing the coordinates of geojson.
 
 
$ create table dfs.tmp.camic as ( select camic.geometry.coordinates[0] as geo_coordinates from dfs.`/home/pradeeban/programs/apache-drill-1.6.0/camic.json` camic);
Error: SYSTEM ERROR: UnsupportedOperationException: Unsupported type LIST

Fragment 0:0

[Error Id: a6d68a6c-50ea-437b-b1db-f1c8ace0e11d on llovizna:31010]

  (java.lang.UnsupportedOperationException) Unsupported type LIST
    org.apache.drill.exec.store.parquet.ParquetRecordWriter.getType():225
    org.apache.drill.exec.store.parquet.ParquetRecordWriter.newSchema():187
    org.apache.drill.exec.store.parquet.ParquetRecordWriter.updateSchema():172
    org.apache.drill.exec.physical.impl.WriterRecordBatch.setupNewSchema():155
    org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext():103
    org.apache.drill.exec.record.AbstractRecordBatch.next():162
    org.apache.drill.exec.record.AbstractRecordBatch.next():119
    org.apache.drill.exec.record.AbstractRecordBatch.next():109
    org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
    org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():129
    org.apache.drill.exec.record.AbstractRecordBatch.next():162
    org.apache.drill.exec.physical.impl.BaseRootExec.next():104
    org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
    org.apache.drill.exec.physical.impl.BaseRootExec.next():94
    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():257
    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():251
    java.security.AccessController.doPrivileged():-2
    javax.security.auth.Subject.doAs():422
    org.apache.hadoop.security.UserGroupInformation.doAs():1657
    org.apache.drill.exec.work.fragment.FragmentExecutor.run():251
    org.apache.drill.common.SelfCleaningRunnable.run():38
    java.util.concurrent.ThreadPoolExecutor.runWorker():1142
    java.util.concurrent.ThreadPoolExecutor$Worker.run():617
    java.lang.Thread.run():744 (state=,code=0)
 
 
Here, I am trying to query a multi-dimensional array, which is not straight-forward.

(I set the error messages to be verbose using  SET `exec.errors.verbose` = true;
 above).
 
The commonly suggested options to query multi-dimensional arrays are:

1. Using the array indexes in the select query: This is impractical. I do not know how many elements I would have in this geojson - the coordinates. It may be millions or as low as 3.
2. Flatten keyword: I am using Drill on top of Mongo - and finding an interesting case where Drill outperforms certain queries in a distributed execution than just using Mongo. Using Flatten basically kills all the performance benefits I have with Drill otherwise. Flatten is just plain expensive operation for the scale of my data (around 48 GB. But I can split them into a few GB each).
 
This is a known limitation of Drill. However, this significantly reduces its usability, as the proposed workarounds are either impractical or inefficient.

Wednesday, September 14, 2016

Moments with Llovizna: Random Thoughts of a Gypsy Student

I started my blog in 2009 mostly as an internship diary and then continued to blog about my Google Summer of Code projects, final year project, and the relevant findings on programming and software in general. I blogged about AbiWord almost 50 times, and never thought I would blog about something else before moving to Lisboa for my masters, EMDC. I enjoy presenting my PhD (EMJD-DC) work at conferences. Due to the mandatory mobility of my master and PhD, I travelled and migrated across the countries and continents, which made Llovizna a travel blog too.


I like to stay awake during the take off and landing, as I usually get to sit in the middle seat. However, for some unknown reason, I mostly fall asleep just before taking off and wake up immediately when the flight is almost in the cruising altitude. I like watching movies in the long flights. They come with subtitles, so that I can understand all foreign language movies - mostly I watch Asian movies. The flights that are shorter than 7 hours mostly do not have movies. The airlines in the US, specially the domestic ones, are terrible, even if they travel as long as 5 hours. I remember a flight from Atlanta to San Francisco of 5.5 hours in United Airways - not offering meals, citing that this is the norm in domestic flights.

View from the empire state building
When I do not watch movies, sleep, or have meals, I just look outside the window and see the big world outside. I often do not sleep the night before sleep, as I get busy with packing, and also because I fear not being able to wake up on time for the flight. This makes me have the typical zombie walk - a walk without realizing or sensing the environment, more like a zombie.

I enjoy conferences - meeting, listening to, and share ideas with fellow researchers. I enjoy travels. However, I must admit that it is a hard situation where you are in a new and beautiful city (which most conference venues are), but have to focus on the conference and networking instead. Recently, I have somehow mastered the balance of it, winning over the jet lag and the tiredness related to the long flights. Perks of being a gypsy student. In addition to the conference related travels, I do enjoy some small trips when I have time and money at the same time, as the recent trip to NYC during a long weekend.

We enjoyed cooking in Atlanta. It has fresh food in decent prices. YDFM was our favourite. As the time to return to Portugal comes, I am foreseeing yet another intercontinental flight of the year. Mostly I end up packing the last moment, throwing away things at the eleventh hour or packing/storing things in bulk. One exception was my move back from Rijeka to Lisboa, where I managed to finish all the food items I bought - including oils, rice, and vegetables. Since 2012, I have this weird migration pattern of Sri Lanka -> Portugal -> Sri Lanka -> Sweden -> Portugal -> Croatia -> Portugal -> USA -> Portugal -> Belgium (expected) -> Portugal (expected). This summarizes my gypsy life so far, and in the near future.

I still remember my first walk from my apartment to my lab through the beautiful lanes of Atlanta, with a map. There was a guy in front of me walking. I was under the impression that he was going to the University as well. I was wrong - he turned to the opposite direction. Luckily I was not blindly following him (which I never do actually. :) ) The summer in Atlanta is much longer. When we return to Lisboa in two weeks, it would be autumn there already, with getting colder and wet. Hope it was still warm enough for some walks at the Parque das Nações.

Saturday, September 10, 2016

A Dynamic Data Warehousing Platform for Creating and Accessing Biomedical Data Lakes.

This week we had our paper titled "A Dynamic Data Warehousing Platform for Creating and Accessing Biomedical Data Lakes." presented in Second International Workshop on Data Management and Analytics for Medicine and Healthcare (DMAH'16), co-located with 42 nd International Conference on Very Large Data Bases (VLDB 2016). Sep. 2016.

Abstract: Medical research use cases are population centric, unlike the clinical use cases which are patient or individual centric. Hence the research use cases require accessing medical archives and data source repositories of heterogeneous nature. Traditionally, in order to query data from these data sources, users manually access and download parts or whole of the data sources. The existing solutions tend to focus on a specific data format or storage, which prevents using them for a more generic research scenario with heterogeneous data sources where the user may not have the knowledge of the schema of the data a priori.

In this paper, we propose and discuss the design, implementation, and evaluation of Data Café, a scalable distributed architecture that aims to address the shortcomings in the existing approaches. Data Café lets the resource providers create biomedical data lakes from various data sources, and lets the research data users consume the data lakes efficiently and quickly without having a priori knowledge of the data schema.

Thursday, September 8, 2016

Finding your student apartment in Lisbon and Porto..

Originally written in 2014, this post was later updated recently.

It is not always an easy task to move to a country, which doesn't speak a common language as you do. Moreover, the visa requirements make it a time consuming process. Following my blog post on how to apply for a visa to study in Portugal (mostly focusing on Sri Lankan students; but also applicable to many other third countries), I started to receive questions from students seeking information on apartments and studies in Lisbon. There are a few issues that you may have to face. This post tries to discuss those concerns.

1. Universities provide a list of landlords to the students. But in my observation, these apartments tend to get booked faster and are also more expensive than the other available options. You are left with no choice other than to follow this list, as it is safer to reserve a room that your university recommends, than a random one as you will have to send one month of reservation/deposit of the room to the landlord. You will also have to be quick for the visa processes and also to make sure that you are not running out of the good and affordable apartments.

2. Not all the landlords will be able to understand English, and communicating with them regarding the visa requirements such as accommodation letters may be hard.

3. You are reserving a room without seeing it yourself, unless you are already there in the city. Mostly, descriptions by words are available, which are written by the landlords themselves.

4. It is very hard to send money to a third person by bank-transfer from Sri Lanka, due to the local regulations. This is done by the banks, case-by-case. It took me a considerable effort to send the reservation fee to the landlord. A transfer by paypal would be more convenient, though not many landlords would accept that. This issue may be specific to Sri Lankans. 

UniPlaces provides solution for these issues. UniPlaces.com is a third-party web site that provides accommodation options to students away from home. It lets the landlords list their apartments for free.

1. UniPlaces is partnered with the major universities in Lisbon, including the University of Lisbon (ULisboa). This makes UniPlaces a trusted web site for the students. It also includes a large number of options to choose from.

2. Having a proactive support service very fluent in English (and Portuguese, of course), it makes it easier to communicate via their support system (chat) or phone. If you need some assistance you may drop your number and information for UniPlaces to contact you.

3. UniPlaces provides neutral descriptions on the apartments, with photos. Searching for your apartment with specific requirements such as price, room type, and other features is very promising. As you may not be able to see your room before arriving at Lisbon, having a verification from a third-party makes your journey to Lisbon stress-free.

4. UniPlaces provides secured payment option via PayPal. Nevertheless, for those who have trouble with PayPal, UniPlaces still offers an option to pay through bank transfer.

5. As a start-up consisting of an international team of geeks, fresh graduates and students, UniPlaces has managed to have the student-feel in their listings. Learn more about the recommended areas in Lisbon, from UniPlaces.

Apart from Lisbon, UniPlaces lets you find apartments in London currently.


Later update on the 8th of September, 2016
It is expected that as a company grows, its approach changes considerably. Recently I was able to book an apartment through UniPlaces. The landlord accepted the booking, and hence the full charge of the first month rent as well as the service fee was reduced from my account.

However, after 10 days, the landlord cancelled the booking through UniPlaces without even letting me know for as a matter of courtesy and professionalism. It is funny that after accepting my payment the landlord also claimed one of the room mates has a dog, and asked whether I am fine with that. I mentioned him I am fine with the dog. It is contradictory with his UniPlaces posting where it indicates "No Pets Allowed"
 

Also after cancelling my booking for no reason, he immediately made the same apartment open for the same dates! See (October 1st, 2016 listed as available - the same dates I booked before).

I suspect there is a strong scam going on here with this landlord. Probably he is collecting the tenant details. Probably he is plain crazy and nasty.

But what saddens me is the UniPlaces' don't care policy. I warned them with all the details on this, to no proper response. The landlord now will go on to scam other students, wasting their time, and making their money locked between the bank transactions. I am awaiting the money to be refunded to my account. UniPlaces claimed that it has been refunded already - so I guess I will get it back again.

While I will still use UniPlaces for booking as I know the team and they are legit, I must warn them to be more caring regarding students. I mean, if as a student I cancel my booking, my service fees is gone and in most cases I won't get the security deposit (the first month rent paid to the landlord through UniPlaces) back. But on the other hand, the landlord can get away with his tactics by accepting the student/tenant at will, and rejecting later at will - with no reason whatsoever.

Users are the major pillar of any business. I hope UniPlaces will learn to respect the user reviews given to them in a private manner more positively, hence not requiring me to update this blog post.

For students, if you are searching for apartments, feel free to use UniPlaces.com. However, be aware of the scam artists as landlords. On the other hand, be warned that the prices are increased (which is reasonable for multiple reasons: UniPlaces needs to be paid for their services and employees. Second, there is an 8% flat rate per monthly rent charged on the landlords, which the landlord in turn collects back from the student by increasing the price. In addition there is a 25% charge for the first month as well). In addition some landlords tend to increase the price arbitrarily further more - as an unsuspecting student may pay more - why not. If you compare the price, I was able to find an almost identical studio for 450 Euro in Alameda listed as 675 Euro.

I wish the services be more transparent and caring of their all customers, than focusing on a specific subset (in this case, only the landlords, as the landlords tend to be the permanent customers with students being the one-time tenant customers).

In case, if you are looking for an apartment in Lisboa, NEVER find this landlord, who hides the existence of a dog (going to an extent to say "Pets are not allowed" in the listings - double standards!) and informs later after paying, and also cancels the booking once you have paid! Most probably his descriptions are fake too.
I wish good luck and success to my colleagues at UniPlaces despite these issues. Hope they will filter out these scam landlords sooner, and be more caring towards their student customers too, as they used to be - in the good old days.

Thursday, August 18, 2016

SDN helps other Vs in BigData

I will be working on a book chapter "SDN helps other Vs in BigData". You may find more details at the book's web page Big Data and Software Defined Networks and the Table-of-Content.

It is going to be an interesting book, as I am interested in both SDN and BigData, and this is a topic that covers both.

Tuesday, August 16, 2016

How not to send a follow up (marketing / sales) email

Today I received another follow up email. Ever since I came to the US, I am getting many of these. This is probably the worst follow up email I received from a big company. I previously had received emails from some other companies. They were better in nature. At one instance, one company was too eager in sales email and sent more email even after I clearly indicated I am no longer evaluating them since I am busy. But this takes everything to new level.
Creepy and lazy follow up email hurts more than helps

So the title says "follow up from download". First no proper title - no correct capitalization.

The email starts with Hi {FirstName}. They are too lazy to check the email formatter. Just like you would spend time on testing the product, please spend time on testing these email sender programs. That will help you a lot.

"Saw you had a chance to download.."
Comes to the point too quick. Somewhat scary, and reads like someone was monitoring/watching/seeing my whole activity.

".. download our Community Edition (M3)"
I swear to the God that I do not remember what product he is referring to. It would be better, if you mention the product name instead of only "Community Edition". M3 can even be omitted.

"A lot of our customers are happy with community support. However, some have come back asking to provide enterprise grade support. We're pleased to announce we now support enterprise grade.."
The fact that "some have come back asking for paid support" does not motivate me to get it. There is no new information provided to arouse my curiosity or interest, except some filler sentence.

"I can fill you in on details if you're interested."
No thanks. I am not going to waste time dealing with automated telephone calls or more emails. In fact, I am not going to reply a "No thanks" either. This blog post should be sufficient.

"Do you have time to connect sometime this week?"
Certainly not. Why would I connect with some emailing bot for no reason?

"Best,
David"
Too common name with no results in Google or LinkedIn search for full name with company. Probably a made up name.
The not-so-improved further follow up emails.

I indeed opened this email (and further follow-up emails from the same sales person) despite its lack of interest in the title just because of the company that sent this. Now they have lost my reply. However, I decided to make a blog post to summarize my learning. If you are going to send a customer a follow up email, use it as an opportunity. Don't send a random email just because you learned somewhere that sending a follow up email would earn you a customer. Better wait for the moment and send the right mail than a random one, like the one I dissected above.

Later Update
As you can see above, I received further follow up emails. In the second email, you may see he just mentions

"I wanted to follow up from my previous email. 
 
Are you available for a quick call this week?
 
This does not add any value to the previous email, and vague at its best. 


The third email goes on to say, 
 
"If anything, I’d love to provide you with resources to help educate you and your team on what’s new in the space. "
 
It is fine that you like to "educate" your customer. But do not say so explicitly. First, I am an individual - rather, a poor student, I am not a team. Moreover, I do not think I need some sales team to "educate" me on this domain. I am pretty good. ;)


Finally, the fourth (and potentially, last) email goes,
 
"Since I haven’t heard back from you, I thought I might be heading down the wrong path.

Is there a better time frame or a different person I should be reaching out to discuss Big Data/Hadoop with?

If so, I’d appreciate you pointing me in the right direction."

So the sudden realization comes that the "potential lead" is possibly uninterested. Only in this last email, I realize that all this time, he just wanted to discuss with me BigData and Hadoop. It might be a good idea to mention this in the first email itself than being so vague. I mean, come on - I am not going to let you spam another person by forwarding you to them. :)

So next time, kids. Don't just keep emailing. I know, you must have learned "persistence is the key", or "potential clients reply 85% at the 5th email" in your sales and marketing training. But don't just send meaningless emails. Put some real effort, and send when you are confident. Hope that helps. Let me know if you need any help in writing an email, of course. ;)