Association Rules Analyzer

Our free association rules analyzer can be accessed via this link

General information

The main task of the association analysis is the identification and validation of rules for the common occurrence of variables on the basis of past observation histories (“item lists”).
The variables can be of a variety of types, such as jointly purchased products in (online) commerce (“market basket analysis”). Accordingly, the determined rules can be used in a variety of ways, such as for buying recommendations of books or shelves compilations in supermarkets.

Association analysis has become established in recent years, especially in online and retail trade. In addition, however, it can also be applied to countless other areas ranging from the analysis of co-occurring characters in television series to the identification of cause-and-effect relationships of operational loss events.

The basis of popular association analysis methods are powerful algorithms for rule determination, such as the rule-finding “Apriori” algorithm.
In addition, some helpful metrics have been established to further investigate the rules found. The most common are:

  • Support. This is the frequency of common occurrence of variables. For example, 15% support for the combination (milk, eggs) means that milk and eggs were purchased in 15% of all observed purchases. The support does not depend on the order of the variables and can be between 0% and 100%.
  • Confidence. This means the security of a determined rule. An 80% confidence for the rule “Milk -> Eggs” would mean, for example, that in 80% of the cases where milk was bought, eggs were also purchased (for example, to bake cakes). Confidence is directional and can range from 0% to 100%.
  • Lift. This refers to the factor by which the common occurrence of variables is more frequent than would be expected if they were independent of each other. A lift of e.g. 3 for the combination (milk, eggs) would mean that this combination is three times more common than would be expected by chance. The lift can in principle assume any value greater than or equal to zero. A value greater than one implies that variables tend to coexist, a value less than one implies that they are more likely to be mutually exclusive.

Since the number of determined rules is often very large, it is usually essential to limit the rules in advance to a reasonable number. In particular, this can be done by setting lower limits for support and confidence and thus determining only the most important rules. The remaining rules can then be tabulated or graphically examined.

Our App

Goal

The goal of this free web app is to enable simple association analysis using uploadable item lists and adjustable measure thresholds.
From this, the corresponding rules are automatically determined and provided as a downloadable table and graphically.

We offer all of the methods available here also offline and in various extensions to B2B customers, for example for the analysis of operational loss events. We are happy to assist you with related questions.

Usage

The use of our association rules website is straight-forward.

After the reCAPTCHA test, an own CSV file (comma-separated UTF-8 file with no special characters, no header, and no index) can be uploaded and the minimum values for support (in %), confidence (in %) as well as lift boundaries for common and uncommon co-occurences (in absolute numbers) can be set. If no CSV file is uploaded, a sample file is used instead.

After clicking on “RUN” the calculation is started and the association rules are obtained and displayed.

Please note that the values of the support, confidence and lift boundaries have a significant effect on the number of the determined rules. Too low figures may result in a time-out due to a too high number of the obtained rules – too high figures may result in no rules at all. Therefore it is recommended to test different parameter combination for new datasets until the required results are obtained.

As an output of the calculation, the top rules are obtained, viz with the highest support and confidence (see below). The rule components “antecedents” and “consequents” are shown as separate columns, as well as the according support, confidence and lift.

In order to obtain more “interesting” rules, it is also possible to obtain exclusively the rules with a minimum lift (“Lift 1”). These rules are shown in a separate table (see below) and provide an indication for items that occur much more often commonly than it would be expected by chance.

The rules for the common co-occurences are also displayed graphically (see below) to enable a more efficient analysis.

Similarly, rules for rare common occurrence (i.e., elements that normally exclude each other) are also calculated and displayed. These rules are filtered by setting an upper limit for “Lift 2”.

The rules can also be downloaded as an XLS file.

Contact

Dr. Dimitrios Geromichalos
Founder / CEO
RiskDataScience UG (haftungsbeschränkt)
Theresienhöhe 28, 80339 München
E-Mail: riskdatascience@web.de
Telefon: +4989244407277, Fax: +4989244407001
Twitter: @riskdatascience