Thursday, February 16, 2012

A priori algorithm in Association Rules

MS uses the a priori algorithm in Association Rules, while other DM software have gone to the Novel Algorithm. Can you tell us why MS decided to stay with the a priori? Did you overcome the limitations that it's accused of having? Thanks!

I was only able to find one reference to a product that claimed it used "the" novel AR algorithm, without much more description. I found many AR algorithms with novel approaches to solving various problems. Do you have a reference to a paper?

Our apriori implementation does a lot of different things to work around the limitations of apriori. For example, we do things like allow less frequent frequent itemsets to be flushed to disk during our counting, or automatically change the thresholds to handle memory pressure. What limitations in particular are you curious about?

|||

A DM package I used to use, Statistica Data Miner, uses the Novel algorithm. I found this link to a PowerPoint presentation that discusess some of the issues: http://www.cs.ndsu.nodak.edu/~wguo/CS765.ppt#295,16,References:

However, since MS Association rules employs trees as part of the algorithm, you may be doing the same thing as the Novel algorithm. What do you think?

|||

Dear Sir,

I understand that you have implemented the Apriori algorithm code

Have you implemented it with csharp.net?

Could you please post me the code to use it in my research?

Thanks in advance

|||

Hi Roger,

I have not had a chance to look into the Novel algorithm, but we do definitely organize the itemsets generated in a tree where by you can save space by recognizing that many itemsets share items (Some itemsets are subsets of others). It also proved important to keep the tree representation in order to be able to perform predictions efficiently where the tree serves as an index to the itemsets (and rules).

Thanks,
Jesper Lind (Microsoft Research)

No comments:

Post a Comment