Friday, January 1, 2010

Association Rule Mining with Extended Vertical Format Data Mining

We, the team of Mooshabaya (De Alwis.K.D.B.C, Malinga.A.S, Pradeeban.K, Weerasiri.W.A.D.D.) chose "Association Rule Mining with Extended Vertical Format Data Mining" as our Advanced Database CS4420 module research project. The project proposal can be found here. The research paper we submitted is given below.

Analyzing the data warehouses to foresee the patterns of the transactions of the businesses and scientific infrastructures often needs high computational power and a high memory space due to the huge set of past history of data transactions. With the fragmented data along with the current trend of distributed systems, most of the fundamental algorithms that are initially proposed to find the association among the itemsets in the data warehouses are inefficient either in throughput or the utilization of the resources.

Apriori algorithm is such an algorithm which was proposed to mine the data warehouses to find the associations. Apriori, though being the mostly learned and implemented algorithm for data mining, it is generally not an optimized algorithm. More variations, improvements, and alternatives have been suggested to overcome the inefficiency of Apriori algorithm, either as a whole or to a particular specific set of data. In either case a fraction of improvement in the algorithm often improves the mining considerably. Vertical Format Data mining is one of the efficient alternatives to Apriori algorithm. In this paper we are proposing an algorithm as an alternative to Apriori algorithm, which will use bitmap indices in conjunction with vertical format data mining. The implementation of the proposed algorithm is benchmarked with an implementation of Apriori Algorithm against a chosen set of benchmarks, which is supposed to be more efficient than its predecessors.

Initial Paper: PDF.

Update as on 18th Dec: We further worked on the algorithm and published a paper ("Horizontal Format Mining with Extended Bitmaps") on this. Slides with explanation on the algorithm can be downloaded here.

No comments:

Post a Comment

You are welcome to provide your opinions in the comments. Spam comments and comments with random links will be deleted.