We, the team of Mooshabaya (De Alwis.K.D.B.C, Malinga.A.S, Pradeeban.K, Weerasiri.W.A.D.D.) chose "Association Rule Mining with Extended Vertical Format Data Mining" as our Advanced Database CS4420 module research project. The project proposal can be found here. The research paper we submitted is given below.
Analyzing the data warehouses to foresee the patterns of the transactions of the businesses and scientific infrastructures often needs high computational power and a high memory space due to the huge set of past history of data transactions. With the fragmented data along with the current trend of distributed systems, most of the fundamental algorithms that are initially proposed to find the association among the itemsets in the data warehouses are inefficient either in throughput or the utilization of the resources.
Apriori algorithm is such an algorithm which was proposed to mine the data warehouses to find the associations. Apriori, though being the mostly learned and implemented algorithm for data mining, it is generally not an optimized algorithm. More variations, improvements, and alternatives have been suggested to overcome the inefficiency of Apriori algorithm, either as a whole or to a particular specific set of data. In either case a fraction of improvement in the algorithm often improves the mining considerably. Vertical Format Data mining is one of the efficient alternatives to Apriori algorithm. In this paper we are proposing an algorithm as an alternative to Apriori algorithm, which will use bitmap indices in conjunction with vertical format data mining. The implementation of the proposed algorithm is benchmarked with an implementation of Apriori Algorithm against a chosen set of benchmarks, which is supposed to be more efficient than its predecessors.
Pradeeban Kathiravelu is an Open Source Evangelist. He is a PhD
Researcher at INESC-ID Lisboa/Instituto Superior Técnico, Universidade
de Lisboa, Portugal and Université catholique de Louvain, Belgium. He is
a fellow of Erasmus Mundus Joint Doctorate in Distributed Computing
He holds a Master of Science degree, Erasmus Mundus
European Master in Distributed Computing (EMDC), from Instituto Superior
Técnico, Portugal, and KTH Royal Institute of Technology, Sweden. He
also holds a Bachelor of the Science of Engineering (Hons) degree,
majoring Computer Science & Engineering, with a first class from the
University of Moratuwa, Sri Lanka. He is an old boy of Royal College,
Colombo, Sri Lanka.
His research interests include
Software-Defined Networking (SDN), Distributed Systems, Cloud Computing,
Web Services, BigData in Biomedical Informatics, and Data mining. He is
highly interested in FOSS development, and is an active participant of
the Google Summer of Code (GSoC) program since 2009, as a student and as