We, the team of Mooshabaya (De Alwis.K.D.B.C, Malinga.A.S, Pradeeban.K, Weerasiri.W.A.D.D.) chose "Association Rule Mining with Extended Vertical Format Data Mining" as our Advanced Database CS4420 module research project. The project proposal can be found here. The research paper we submitted is given below.
Analyzing the data warehouses to foresee the patterns of the transactions of the businesses and scientific infrastructures often needs high computational power and a high memory space due to the huge set of past history of data transactions. With the fragmented data along with the current trend of distributed systems, most of the fundamental algorithms that are initially proposed to find the association among the itemsets in the data warehouses are inefficient either in throughput or the utilization of the resources.
Apriori algorithm is such an algorithm which was proposed to mine the data warehouses to find the associations. Apriori, though being the mostly learned and implemented algorithm for data mining, it is generally not an optimized algorithm. More variations, improvements, and alternatives have been suggested to overcome the inefficiency of Apriori algorithm, either as a whole or to a particular specific set of data. In either case a fraction of improvement in the algorithm often improves the mining considerably. Vertical Format Data mining is one of the efficient alternatives to Apriori algorithm. In this paper we are proposing an algorithm as an alternative to Apriori algorithm, which will use bitmap indices in conjunction with vertical format data mining. The implementation of the proposed algorithm is benchmarked with an implementation of Apriori Algorithm against a chosen set of benchmarks, which is supposed to be more efficient than its predecessors.
Kathiravelu Pradeeban is an Open Source Evangelist. He is a postgraduate student of the Erasmus Mundus European Master in Distributed Computing, a Master of Science joint degree from Instituto Superior Tecnico, Lisbon, Portugal, and KTH Royal Institute of Technology, Stockholm, Sweden. He holds a Bachelor of the Science of Engineering (Hons) degree, majoring Computer Science & Engineering, with a first class from the University of Moratuwa, Sri Lanka [Batch 2010]. He is also an old boy of Royal College, Colombo [A/L 2005]. He is highly interested in FOSS development, and is a four time participant of the Google Summer of Code (GSoC) project. With AbiWord (2009 as a student and 2011 and 2012 as a mentor), an award winning light weight word processor, and with (OGSA-DAI) Open Grid Services Architecture Data Access and Integration (2010 as a student, and a committer thenceforth) - an innovative solution for distributed data access and management, mentored by OMII-UK. His research interests include Distributed Computing and Data mining.