Sunday, September 28, 2014

Fonte da Talha - Portugal's finest beaches

A big jellyfish found in Fonte da Talha
I decided to go to Fonte da Talha beach, yesterday, despite the fact that the weather forecast showed it had 50% probability of a rain. :D I took a bus (TST - 161) from Praça do Areeiro, Lisboa to Costa da Caparica (4.10 Euro. Comes once in 30 minutes). My plan was to take the beach train (Comboio da Praia) - more of a toy train, which I read somewhere in the Internet, that it operates in the weekends from 9 am - 7 pm (return trip, which costs 7.50 Euro). It had rained in the morning - the roads were wet, and it was still drizzling. When I got down from the bus, I asked the bus driver, "Could you please tell me, how to get the Comboio da Praia?" I repeated, "Comboio da Praia" Usually the people here are very very helpful. Just some of them do not speak good English. So, if you repeat the Portuguese phrases or destination clearly in Portuguese, you will surely get the help. But the bus driver spoke a very good English. "The train stopped on the 29th of August." He smiled, and continued, "Sorry, next year." :P The train seems to be seasonal.

He suggested me to get a bus towards Fonte da Talha (TST - 130). While I was walking around to locate the bus stop, a 130 just passed by. So I missed one 130 - I waited for the next one. For around 30 minutes. Probably more than that. These buses are not frequent. Probably once in an hour, or once in two hours. Who knows? :P

I am not going to give up. I decided to try a taxi. Taxis were not frequent either, and all of them seem to come occupied. After some wait, I was able to get a taxi. I explained him I am going to the praia towards Fonte da Talha. He understood I am going to a beach, and he said, he cannot take me to the beach. But he can drop me closer in the main road. I said, "sure". :D

It costs me 5 Euros. Then I walked a long distance passing beach parties including a kids party hosted by different beach bars on the way, and sooner, I was on the beach. :D The area was quite active actually. I enjoyed the sun and the waves. Waves were pretty huge - probably because of the weather. The water was warm and comfortable though. After a few hours, I walked further south by the beach, as it started to drizzle with a strong (very very strong) wind, sky covered by black clouds. I finally reached the fishing village, Fonte da Talha. The village is kind of on top of the near by mountain. :D I attempted to climb it in search of the bus that would take me back to Costa da Caparica. The shower was getting stronger, and I gave up the idea to find the bus back.

Kite surfing in a windy day..
I returned to the previous bus stop near to the beach, where I found a TST bus (#127) that goes to Cacilhas. I remember last year I took a boat from Cais do Sodre to Cacilhas, and then a bus from Cacilhas to Costa da Caparica. :D I am going to do the same. To Cacilhas, and then a boat to Cais do Sodre. It cost me 4.10 again to go to Cacilhas. After a somewhat long bus ride, I was in Cacilhas. I still remembered the place very well, that I did not have to ask for directions anymore. The boat to Cais do Sodre comes once in 20 minutes, for 1.20 Euro. I finally reached Lisboa safely. :D

So I realized, to use the public transport to Fonte da Talha from Lisboa - the best option is to go to Cacilhas by boat from Cais do Sodre and to Fonte da Talha by bus from Cacilhas. The total transportation expense for my trip was, 14.40 Euro (4.10 Euro for Lis/Areeiro -> Costa da Caparica, 5 Euro for Taxi towards a nowhere zone, 4.10 Euro for Fonte da Talha -> Cacilhas, and 1.20 Euro for Cacilhas -> Lis/Cais do Sodre).

An update from the 5th of October: I went again to Fonte da Talha to explore the land side and the cliff. This time from Cais do Sodre -> Cacilhas by boat, and Cacilhas -> Fonte da Talha by bus TST 145. Total cost for the trip was (1.20 for boat one way + 3.20 for 3-zone TST bus, one way) * 2 -> 8.80 Euro. Had a long walk, enjoying the nice view, and so many kite surfers.

This may probably be my last beach trip for this summer. But time goes very fast. Within a very short time we will have our next summer. Hopefully. :)

Saturday, September 27, 2014

The Summer!

Caxias - view from the small fort in the park!
Though I have been in Portugal for more than two years, it is only this summer I got the opportunity (and interest) to enjoy some not-so-popular-among-Erasmus-students beaches. Earlier, I had already visited Cascais, Costa da Caparica, Troia, Lagos Meia Praia, Porto Matosinhos, and Estoril beaches. This summer, I visited Ilhas Desertas, Caxias (a small beach on the way to Cascais/Estoril from Lisbon. Very close to Lisbon by train towards Oeiras or Cascais), and Porto Covo in the Alentejo region (using Rede Expressos bus from Lisboa).

A trip to Santo André (Santiago do Cacém) and Porto Covo was not well prepared. But came out pretty well. During this trip, I realized that the bus tickets are not always sold from a separate office. I had to buy the ticket from a cafe in Santo Andre, and from a mini post office in Porto Covo. Porto Covo was stunning! Unfortunately I forgot to take my camera to this trip. So no photos. :(

The summer is almost finishing, and I had lost more than a month of summer in Portugal due to my thesis defence, and trips to Shenzhen, from 22nd Aug - 6th Sep (more on this later, probably), and Paris, from 8th Sep - 12th Sep. When I am done with my master thesis defence, it is almost autumn! But I am not giving up you summer, yet!

Thursday, September 25, 2014

UCC 2014 - An Adaptive Distributed Simulator for Cloud and MapReduce Algorithms and Architectures

Part of my master thesis work has already been presented a work-in-progress paper at MASCOTS 2014. Now, it has also produced a full paper, that is accepted as a full paper to IEEE/ACM 7th International Conference on Utility and Cloud Computing (UCC 2014), to be presented in London, 8th - 11th, December 2014. 

An Adaptive Distributed Simulator for Cloud and MapReduce Algorithms and Architectures
Scalability and performance are crucial for simulations as much as accuracy is. Due to the limited availability and access to the variety of resources, cloud and MapReduce solutions are often evaluated on simulator platforms. As the complexity of the architectures and algorithms keep increasing, simulations themselves become large and resource-hungry. Simulators can be designed to be adaptive, exploiting the clusters and data-grid platforms. This paper describes the research for the design, development, and evaluation of a complete fully parallel and distributed cloud and MapReduce simulator (Cloud2Sim), leveraging the Java in-memory data grid platforms. Cloud2Sim provides a concurrent and distributed cloud simulator, by extending CloudSim cloud simulator, using Hazelcast in-memory key-value store. It also provides an assessment of the MapReduce implementations of Hazelcast and Infinispan, with means of simulating MapReduce executions. Cloud2Sim scales out the cloud and MapReduce simulations to multiple nodes running Hazelcast and Infinispan, based on load. The distributed execution model and adaptive scaling solution could further be leveraged as a general purpose auto-scaler middleware for a multi-tenanted deployment.

Tuesday, September 23, 2014

The next chapter

Final Submission, as a CD..
I recently posted my previous blog post on EMDC, with my thesis presentation slides. I shared it over the social media with the below caption.

Multiple reasons to read this post.
1. Probably my last post on my life with "EMDC", my master studies.
2. It contains my master thesis presentation slides and some secret information..
3. It may be emotional. Perhaps not..
4. It contains information on what is coming up next.. 

Since I am continuing the PhD in INESC-ID Lisboa / IST, there is no feeling of finishing or leaving something. Just missing the friends. Today, I submitted the 2 CDs of my final thesis to the Academic Services Department. All went well.

I also signed the employment contract for the EMJD-DC till the end of August in 2017, from the human resources department. Now time to start - I mean, resume - the work. :) It rains, and I dislike it here. I had plenty of sun light in Sri Lanka, which I dearly disliked, due to the high relative humidity. Now the summer is over. But I am sure it will come back pretty soon.

Big Data – An Introduction and a Look from In-Memory Data Grids

With the increasing number of sensors, computers, and smart phones more and more data is stored digitally. Traditionally, the Internet followed a model of information provider – information consumer, where the web sites produced information, which was merely consumed by a number of users. Nevertheless, Web 2.0 sees more user engagement. An average Internet user nowadays is not a mere consumer. Rather, she produces content through the social media and forums and engages with the existing data over the Internet. User interactions can be mined to produce more interesting information. With the huge amount of data produced, it is apparent that the data storage and manipulation should scale as well. Big data is a paradigm that attempts to handle data in a larger scale compared to the traditional means of data storage and access mechanisms such as relational databases.


Store, Process, and Access the 3 "V"
3V of Big Data
Information in the Internet keeps growing exponentially, as more and more information is made public through the wires, making a paper-less work environment. There is a definite paradigm shift on the view of historical data, from mere log files in the backup disks, to useful information for analytics and data mining, stored in data warehouses. For example, The Internet Archive Wayback Machine contains data as much as 2 PB (Peta bytes), which also keeps growing at a rate of 20 TB per month. However, Big data is not just defined by its gigantic volume. Rather, it also depends on the velocity and variety. Velocity defines how fast the data flows in and out of the system. In a weather forecast system, temperature, relative humidity, wind speed, and other measurements from the sensors flow in each second and they should be processed real-time, efficiently. This process involves a huge volume of data moving with a high velocity. The third V, variety, illustrates that the data can be of heterogeneous formats. It can be composed of a set of raw images, numbers, video, or music files. Data in multiple formats should be mashed up and processed to find the interesting information. In the weather forecasting scenario, each sensor may feed its output in different formats to the server. A Big data system differs from a relational data base system in its ability to store, access, and process such a complicated data set.

Pattern Recognition, Data Mining, Machine Learning, and Big Data
A big data solution may not be huge in volume. Obama's big data campaign had 10 TB of data initially, in various forms. Data was processed at a very high speed, as new data was made available by the volunteers and analysts frequently. Hence the campaign had high variety and velocity, with a relatively low amount of volume. This also required 66,000 simulations to be run everyday. The real-time processing of the simulation outcomes required parallel executions. Recently, a study made by Facebook on users' emotions and how the updates seen by the users affect their emotions and posts they share, created some uneasiness among the users. Nevertheless, mining user responses to optimize the business outcomes is nothing new. Regular A/B testing carried out by almost all the mainstream public web sites is an example where different layouts are tested against same content, or different titles and captions are placed, to find which of them persuade the readers to click the link, subscribe, spend more time, or even purchase an item. Patterns are recognized from millions of responses and the analytics are carried out on a big scale big data solutions. Machine learning analytics find recurring patterns and provide futuristic predictions based on the numbers. Big data solutions provide a whole new opportunity for the data mining domain in finding associations and mining.

Scalability and Simplicity
Big Data Solution over In-Memory Data Grid
Increasing volume and velocity of big data requires larger and more powerful computers to scale up. Cloud storage and data-as-a-service solutions started to replace high-end computers with the abundant resources of multiple utility computers. NoSQL solutions were developed having simplicity, horizontal scalability, and big data economy in mind. NoSQL databases have flexible data models that enable storing the variety of data objects of big data, unlike the relational data bases that come with a strict schema. NoSQL solutions can be categorized according to their design and functionality. Key–Value Stores, Column-Oriented Stores, Document-Oriented Stores, and Graph Databases can be considered major categories of the NoSQL data bases, which store data persistently in disk or in-memory.

In-Memory Data-Grids for Big Data
In-Memory Data-Grids (IMDGs) such as Infinispan, Hazelcast, Gridgain, Gigaspaces XAP, VMware vFabric Gemfire, IBM eXtremeScale, and Oracle Coherence exploit the abundant storage, processing, and memory resources in computer clusters, to provide a unified view of the nodes in the cluster. This model of shared storage, memory, and processing enables execution of larger tasks that cannot be executed effectively on a single node. While persistent storages use disk as a storage, in-memory data grids use memory as the storage, adhering to the commonly stated phrase “Memory is the new disk”. Data grids share computing resources among the instances, providing a unified view of a super computer. The abundant availability of memory enables the efficient use of computer cache, thus speeding up the processes beyond linear speedups.

While in-memory data grids have the functionality of integrating with a persistent store, in case of limited available memory to hold a very large object space, persistence of the objects stored in-memory is generally ensured through backups. Data is replicated synchronously or asynchronously based on the configurations. These ensure faster transactions compared to the cheaper disk accesses which provide slower response time. Distributed execution framework implementations handles the “Process” stage of the big data seamlessly, as they execute the algorithms in a distributed manner over the big data. MapReduce frameworks are hence implemented over the in-memory key-value stores such as Infinispan and Hazelcast using the distributed execution frameworks, following the MapReduce model of Hadoop.

Data storage vendors are focusing in positioning their products to withstand the big data storm. Oracle Big Data SQL is a recent product from Oracle, that provides single and optimized SQL query for distributed data, bringing back the simplicity of having a unified view as a traditional simple data base. As one can presume, Oracle Big Data SQL supports multiple data sources including NoSQL solutions, not limiting itself to Oracle database. Interoperability of such platforms show a favorable future for data mining and warehousing.

Conclusion
Data keeps growing, and currently what seems big may turn out to be tiny in the future. Data is measured in Exabytes (EB), when it comes to large data-oriented companies such as Facebook and Google, which was not a frequented measure a decade ago. The paradigm shift towards big data is rather economical, than technical. Though complex computations can be run over super computers with the availability of large resources, an in-memory data grid utilizes the existing resources from the utility computers, aligning with the big data economy. While more and more tools are developed for the sake of scalability and efficiency, research challenges such as security and privacy should not be taken lightly.

Sunday, September 21, 2014

Master Thesis Defence and End of EMDC 2012 - 2014

I defended my master thesis last Friday, the 19th of September, 5.30 - 7.00 P.M. My thesis was titled, "An Elastic Middleware Platform for Concurrent and Distributed Cloud and MapReduce Simulations". It had produced 2 publications - 1 work-in-progress paper already published and the other accepted as a full paper, at the time of the defence. I secured 18 out of 20 as the grade, after the defence. I was the last to defend the master thesis in our batch. Hence, my defence also marks the completion of the EMDC batch 2012/2014.

My thesis presentation slides can be found below.
An Elastic Middleware Platform for Concurrent and Distributed Cloud and MapReduce Simulations from Kathiravelu Pradeeban.

 
I kept my blog updated about my life and studies with my 2 years of master studies with EMDC. Thanks for following them. This probably will be one of the last posts on EMDC, except for the occasional posts recalling EMDC from the future. Now I am continuing my PhD from the same university, under the sister program, EMJD-DC. You may expect to find about my PhD life here as well,  as a continuing story of time line from EMDC posts. Interestingly, EMDC has become the mostly blogged title in my blog. I hope EMJD-DC will overtake, eventually.

Special thanks to my supervisor Prof. LV. Thanks to all my friends who came to cheer me up for my defence. My defence also marks the end of the EMDC 2012/2014 batch.. For me, the defence was just a small interval/break before continuing EMJD-DC - Not leaving the place or university. Good bye everyone of EMDC 2012-2014 who is now doing PhD in other schools or working in big corporations.. See you again soon someday somewhere.. 

Keep in touch..




Dissertação: An Elastic Middleware Platform for Concurrent and Distributed Cloud and MapReduce Simulations
Candidato: Pradeeban Kathiravelu
Presidente: Professor José Carlos Alves Pereira Monteiro
Orientador: Professor Luís Manuel Antunes Veiga
Vogal: Doutor Ricardo Jorge Freire Dias
Data: 19/09/2014 -17h30/ Sala 0.20, Pavilhão de Informática II, IST, Alameda
Abstract: Cloud Computing researches involve a tremendous amount of entities such as users, applications, and virtual machines. Due to the limited access and often variable availability of such resources, researchers have their prototypes tested against the simulation environments, opposed to the real cloud environments.  Existing cloud simulation environments such as CloudSim and EmuSim are executed sequentially, where a more advanced cloud simulation tool could be created extending them, leveraging the latest technologies as well as the availability of multi-core computers and the clusters in the research laboratories. While computing has been evolving with multi core programming, MapReduce paradigms, and middleware platforms, cloud and MapReduce simulations still fail to exploit these developments themselves. This research develops Cloud2Sim , which tries to fill the gap between the simulations and the actual technology that they are trying to simulate.
First, Cloud2Sim provides a concurrent and distributed cloud simulator, by extending CloudSim cloud simulator, using Hazelcast in-memory key-value store. Then, it also provides a quick assessment to MapReduce implementations of Hazelcast and Infinispan, adaptively distributing the execution to a cluster, providing means of simulating MapReduce executions. The dynamic scaler solution scales out the cloud and MapReduce simulations to multiple nodes running Hazelcast and Infinispan, based on load. The distributed execution model and adaptive scaling solution could be leveraged as a general purpose auto scaler middleware for a multi-tenanted deployment.

keywords: Cloud Computing, Simulation, Auto Scaling, MapReduce, Volunteer Computing, Cycle Sharing, Distributed Execution

Saturday, September 20, 2014

MASCOTS 2014 - Concurrent and Distributed CloudSim Simulations

IEEE 22nd International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS 2014) was held at Université Paris Descartes in Paris, France, from 9th - 11th, September 2014. I presented my paper, which was a work-in-progress paper based on my master thesis, " Cloud2Sim - An Elastic Middleware Platform for Concurrent and Distributed Cloud and MapReduce Simulations"

The paper is titled, "Concurrent and Distributed CloudSim Simulations". My presentation slides are given below.

Concurrent and Distributed CloudSim Simulations from Kathiravelu Pradeeban

Given below is the abstract of the paper:
Cloud Computing researches involve a tremendous amount of entities such as users, applications, and virtual machines. Due to the limited access and often variable availability of such resources, researchers have their prototypes tested against the simulation environments, opposed to the real cloud environments. Existing cloud simulation environments such as CloudSim and EmuSim are executed sequentially, where a more advanced cloud simulation tool could be created extending them, leveraging the latest technologies as well as the availability of multi-core computers and the clusters in the research laboratories. This research seeks to develop Cloud2Sim, a concurrent and distributed cloud simulator, extending CloudSim while exploiting the features provided by Hazelcast, Infinispan and Hibernate Search to distribute the storage and execution of the simulation. 

Sunday, September 7, 2014

Shenzhen 2014

23rd of August - 6th of September, 2014
This was my summer trip this year, replacing the usual summer vacation in Sri Lanka this time. A long trip, since I am travelling from Europe, and was also remarkable.

Had a visit to Splendid China and Window of the World theme parks in Shenzhen. Splendid China impressed me more with the touches of China, where Window of the World seemed more like something for kids or for those from Shenzhen city who cannot afford to travel much.

Grandma's 粽子 (Zòngzi)was so delicious. I also enjoyed the tea party - Chinese style. The Chinese desserts were so delicious and also nutritious. They were delicious and healthy. Also the meals were equally delicious. The weather was hot and humid, similar to Colombo, which was a bit uncomfortable. Nevertheless, Shenzhen did not fail to entertain me. :)