11.3 Association Rules Mining

Correlation isn’t causation. But it’s a big hint. [E. Tufte]

11.3.1 Overview

Association rules discovery is a type of unsupervised learning that finds connections among the attributes and levels (and combinations thereof) of a dataset’s observations. For instance, we might analyze a (hypothetical) dataset on the physical activities and purchasing habits of North Americans and discover that

  • runners who are also triathletes (the premise) tend to drive Subarus, drink microbrews, and use smart phones (the conclusion), or

  • individuals who have purchased home gym equipment are unlikely to be using it 1 year later, say.

But the presence of a correlation between the premise and the conclusion does not necessarily imply the existence of a causal relationship between them. It is rather difficult to “demonstrate” causation via data analysis; in practice, decision-makers pragmatically (and often erroneously) focus on the second half of Tufte’s rejoinder, which basically asserts that “there’s no smoke without fire.”

Case in point, while being a triathlete does not cause one to drive a Subaru, Subaru Canada thinks that the connection is strong enough to offer to reimburse the registration fee at an IRONMAN 70.3 competition (since at least 2018)! [198]

Market Basket Analysis

Association rules discovery is also known as market basket analysis after its original application, in which supermarkets record the contents of shopping carts (the baskets) at check-outs to determine which items are frequently purchased together.

For instance, while bread and milk might often be purchased together, that is unlikely to be of interest to supermarkets given the frequency of market baskets containing milk or bread (in the mathematical sense of “or”).

Knowing that a customer has purchased bread does provide some information regarding whether they also purchased milk, but the individual probability that each item is found, separately, in the basket is so high to begin with that this insight is unlikely to be useful.

If 70% of baskets contain milk and 90% contain bread, say, we would expect at least \[90\%\times 70\%=63\%\] of all baskets to contain milk and bread, should the presence of one in the basket be totally independent of the presence of the other.

If we then observe that 72% of baskets contain both items (a 1.15-fold increase on the expected proportion, assuming there is no link), we would conclude that there was at best a weak correlation between the purchase of milk and the purchase of bread.

Sausages and hot dog buns, on the other hand, which we might suspect are not purchased as frequently as milk and bread, might still be purchased as a pair more often than one would expect given the frequency of baskets containing sausages or buns.

If 10% of baskets contain sausages, and 5% contain buns, say, we would expect that \[10\% \times 5\% = 0.5\%\] of all baskets would contain sausages and buns, should the presence of one in the basket be totally independent of the presence of the other.

If we then observe that 4% of baskets contain both items (an 8-fold increase on the expected proportion, assuming there is no link), we would obviously conclude that there is a strong correlation between the purchase of sausages and the purchase of hot dog buns.

It is not too difficult to see how this information could potentially be used to help supermarkets turn a profit: announcing or advertising a sale on sausages while simultaneously (and quietly) raising the price of buns could have the effect of bringing in a higher number of customers into the store, increasing the sale volume for both items while keeping the combined price of the two items constant.169

A (possibly) apocryphal story shows the limitations of association rules: a supermarket found an association rule linking the purchase of beer and diapers and consequently moved its beer display closer to its diapers display, having confused correlation and causation.

Purchasing diapers does not cause one to purchase beer (or vice-versa); it could simply be that parents of newborns have little time to visit public houses and bars, and whatever drinking they do will be done at home. Who knows? Whatever the case, rumour has it that the experiment was neither popular nor successful.


Typical uses include:

  • finding related concepts in text documents – looking for pairs (triplets, etc) of words that represent a joint concept: {San Jose, Sharks}, {Michelle, Obama}, etc.;

  • detecting plagiarism – looking for specific sentences that appear in multiple documents, or for documents that share specific sentences;

  • identifying biomarkers – searching for diseases that are frequently associated with a set of biomarkers;

  • making predictions and decisions based on association rules (there are pitfalls here);

  • altering circumstances or environment to take advantage of these correlations (suspected causal effect);

  • using connections to modify the likelihood of certain outcomes (see immediately above);

  • imputing missing data,

  • text autofill and autocorrect, etc.

Other uses and examples can be found in [132], [199], [200].

Causation and Correlation

Association rules can automate hypothesis discovery, but one must remain correlation-savvy (which is less prevalent among quantitative specialists than one might hope, in our experience).

If attributes \(A\) and \(B\) are shown to be correlated in a dataset, there are four possibilities:

  • \(A\) and \(B\) are correlated entirely by chance in this particular dataset;

  • \(A\) is a relabeling of \(B\) (or vice-versa);

  • \(A\) causes \(B\) (or vice-versa), or

  • some combination of attributes \(C_1,\ldots,C_n\) (which may not be available in the dataset) cause both \(A\) and \(B\).

Siegel [199] illustrates the confusion that can arise with a number of real-life examples:

  • Walmart has found that sales of strawberry Pop-Tarts increase about seven-fold in the days preceding the arrival of a hurricane;

  • Xerox employees engaged in front-line service and sales-based positions who use Chrome and Firefox browsers perform better on employment assessment metrics and tend to stay with the company longer, or

  • University of Cambridge researchers found that liking “Curly Fries” on Facebook is predictive of high intelligence.

It can be tempting to try to explain these results (again, from [199]): perhaps

  • when faced with a coming disaster, people stock up on comfort or nonperishable foods;

  • the fact that an employee takes the time to install another browser shows that they are an informed individual and that they care about their productivity, or

  • an intelligent person liked this Facebook page first, and her friends saw it, and liked it too, and since intelligent people have intelligent friends (?), the likes spread among people who are intelligent.

While these explanations might very well be the right ones (although probably not in the last case), there is nothing in the data that supports them. Association rules discovery finds interesting rules, but it does not explain them. The point cannot be over-emphasized: correlation does not imply causation.

Analysts and consultants might not have much control over the matter, but they should do whatever is in their power so that the following headlines do not see the light of day:

  • “Pop-Tarts” get hurricane victims back on their feet;

  • Using Chrome of Firefox improves employee performance, or

  • Eating curly fries makes you more intelligent.


A rule \(X\to Y\) is a statement of the form “if \(X\) (the premise) then \(Y\) (the conclusion)” built from any logical combinations of a dataset attributes.

In practice, a rule does not need to be true for all observations in the dataset – there could be instances where the premise is satisfied but the conclusion is not.

In fact, some of the “best” rules are those which are only accurate 10% of the time, as opposed to rules which are only accurate is only 5% of the time, say. As always, it depends on the context. To determine a rule’s strength, we compute various rule metrics, such as the:

  • support, which measures the frequency at which a rule occurs in a dataset – low coverage values indicate rules that rarely occur;

  • confidence, which measures the reliability of the rule: how often does the conclusion occur in the data given that the premises have occurred – rules with high confidence are “truer”, in some sense;

  • interest, which measures the difference between its confidence and the relative frequency of its conclusion – rules with high absolute interest are … more interesting than rules with small absolute interest;

  • lift, which measures the increase in the frequency of the conclusion which can be explained by the premises – in a rule with a high lift (\(>1\)), the conclusion occurs more frequently than it would if it was independent of the premises;

  • conviction [201], all-confidence [202], leverage [203], collective strength [204], and many others [205], [206].

In a dataset with \(N\) observations, let \(\textrm{Freq}(A)\in \{0,1,\ldots,N\}\) represent the count of the dataset’s observations for which property \(A\) holds. This is all the information that is required to compute a rule’s evaluation metrics: \[\begin{aligned} \textrm{Support}(X\to Y)&=\frac{\textrm{Freq}(X\cap Y)}{N}\in[0,1] \\ \textrm{Confidence}(X\to Y)&=\frac{\textrm{Freq}(X\cap Y)}{\textrm{Freq}(X)}\in[0,1] \\ \textrm{Interest}(X\to Y)&=\textrm{Confidence}(X\to Y) - \frac{\textrm{Freq}(Y)}{N} \in [-1,1] \\ \textrm{Lift}(X\to Y) &=\frac{N^2\cdot \textrm{Support}(X\to Y)}{\textrm{Freq}(X)\cdot \textrm{Freq}(Y)} \in (0,N^2) \\ \textrm{Conviction}(X\to Y)&=\frac{1-\textrm{Freq(Y)}/N}{1-\textrm{Confidence}(X\to Y)}\geq 0\end{aligned}\]

British Music Dataset

A simple example will serve to illustrate these concepts. Consider a (hypothetical) music dataset containing data for \(N=15,356\) British music lovers and a candidate rule RM:

“If an individual is born before 1976 (\(X\)), then they own a copy of the Beatles’ Sergeant Peppers’ Lonely Hearts Club Band, in some format (\(Y\))”.

Let’s assume further that

  • \(\textrm{Freq}(X)=3888\) individuals were born before 1976;

  • \(\textrm{Freq}(Y)=9092\) individuals own a copy of Sergeant Peppers’ Lonely Hearts Club Band, and

  • \(\textrm{Freq}(X\cap Y)=2720\) individuals were born before 1976 and own a copy of Sergeant Peppers’ Lonely Hearts Club Band.

We can easily compute the 5 metrics for RM: \[\begin{aligned} \textrm{Support}(\textrm{RM})&=\frac{2720}{15,536}\approx 18\% \\ \textrm{Confidence}(\textrm{RM})&=\frac{2720}{3888}\approx 70\% \\ \textrm{Interest}(\textrm{RM})&=\frac{2720}{3888}-\frac{9092}{15,356}\approx 0.11 \\ \textrm{Lift}(\textrm{RM}) &=\frac{15,356^2\cdot 0.18}{3888\cdot 9092} \approx 1.2 \\ \textrm{Conviction}(\textrm{RM}) &=\frac{1-9092/15,356}{1-2720/3888} \approx 1.36\end{aligned}\] These values are easy to interpret: RM occurs in 18% of the dataset’s instances, and it holds true in 70% of the instances where the individual was born prior to 1986.

This would seem to make RM a meaningful rule about the dataset – being older and owning that song are linked properties. But if being younger and not owning that song are not also linked properties, the statement is actually weaker than it would appear at a first glance.

As it happens, RM’s lift is 1.2, which can be rewritten as \[1.2\approx \frac{0.70}{0.56},\] i.e. 56% of younger individuals also own the song.

The ownership rates between the two age categories are different, but perhaps not as significantly as one would deduce using the confidence and support alone, which is reflected by the rule’s “low” interest, whose value is 0.11.

Finally, the rule’s conviction is 1.36, which means that the rule would be incorrect 36% more often if \(X\) and \(Y\) were completely independent.

All this seems to point to the rule RM being not entirely devoid of meaning, but to what extent, exactly? This is a difficult question to answer.170

It is nearly impossible to provide hard and fast thresholds: it always depends on the context, and on comparing evaluation metric values for a rule with the values obtained for some other of the dataset’s rules. In short, evaluation of a lone rule is meaningless.

In general, it is recommended to conduct a preliminary exploration of the space of association rules (using domain expertise when appropriate) in order to determine reasonable threshold ranges for the specific situation; candidate rules would then be discarded or retained depending on these metric thresholds.

This requires the ability to “easily” generate potentially meaningful candidate rules.

11.3.2 Generating Rules

Given association rules, it is straightforward to evaluate them using various metrics, as discussed in the previous section.

The real challenge of association rules discovery lies in generating a set of candidate rules which are likely to be retained, without wasting time generating rules which are likely to be discarded.

An itemset (or instance set) for a dataset is a list of attributes and values. A set of rules can be created from the itemset by adding “IF … THEN” blocks to the instances.

As an example, from the instance set

\[\{ \textrm{membership} = \textrm{True}, \textrm{age} = \textrm{Youth}, \textrm{purchasing} = \textrm{Typical} \},\]

we can create the 7 following \(3-\)item rules:

  • IF \((\textrm{membership} = \textrm{True}\) AND \(\textrm{age} = \textrm{Youth}\)) THEN \(\textrm{purchasing} = \textrm{Typical}\);

  • IF \((\textrm{age} = \textrm{Youth}\) AND \(\textrm{purchasing} = \textrm{Typical}\)) THEN \(\textrm{membership} = \textrm{True}\);

  • IF \((\textrm{purchasing} = \textrm{Typical}\) AND \(\textrm{membership} = \textrm{True})\) THEN \(\textrm{age} = \textrm{Youth}\);

  • IF \(\textrm{membership} = \textrm{True}\) THEN (\(\textrm{age} = \textrm{Youth}\) AND \(\textrm{purchasing} = \textrm{Typical}\));

  • IF \(\textrm{age} = \textrm{Youth}\) THEN \((\textrm{purchasing} = \textrm{Typical}\) AND \(\textrm{membership} = \textrm{True})\);

  • IF \(\textrm{purchasing} = \textrm{Typical}\) THEN \((\textrm{membership} = \textrm{True})\) AND \(\textrm{age} = \textrm{Youth})\);

  • IF \(\varnothing\) THEN (\(\textrm{membership} = \textrm{True}\) AND \(\textrm{age} = \textrm{Youth}\) AND \(\textrm{purchasing} = \textrm{Typical}\));

the 6 following \(2-\)item rules:

  • IF \(\textrm{membership} = \textrm{True}\) THEN \(\textrm{purchasing} = \textrm{Typical}\);

  • IF \(\textrm{age} = \textrm{Youth}\) THEN \(\textrm{membership} = \textrm{True}\);

  • IF \(\textrm{purchasing} = \textrm{Typical}\) THEN \(\textrm{age} = \textrm{Youth}\);

  • IF \(\varnothing\) THEN (\(\textrm{age} = \textrm{Youth}\) AND \(\textrm{purchasing} = \textrm{Typical}\));

  • IF \(\varnothing\) THEN \((\textrm{purchasing} = \textrm{Typical}\) AND \(\textrm{membership} = \textrm{True})\);

  • IF \(\varnothing\) THEN \((\textrm{membership} = \textrm{True})\) AND \(\textrm{age} = \textrm{Youth})\);

and the 3 following \(1-\)item rules:

  • IF \(\varnothing\) THEN \(\textrm{age} = \textrm{Youth}\);

  • IF \(\varnothing\) THEN \(\textrm{purchasing} = \textrm{Typical}\);

  • IF \(\varnothing\) THEN \(\textrm{membership} = \textrm{True}\).

In practice, we usually only consider rules with the same number of items as there are members in the itemset: in the example above, for instance, the \(2-\)item rules could be interpreted as emerging from the 3separate itemsets

\[\begin{align*}\{\textrm{membership} &= \textrm{True}, \textrm{age} = \textrm{Youth}\} \\ \{\textrm{age} &= \textrm{Youth}, \textrm{purchasing} = \textrm{Typical}\} \\ \{\textrm{purchasing} &= \textrm{Typical}, \textrm{membership} = \textrm{True}\}\end{align*}\]

and the \(1-\)item rules as arising from the 3 separate itemsets

\[\{\textrm{membership} = \textrm{True}\},\{\textrm{age} = \textrm{Youth}\}, \{\textrm{purchasing} = \textrm{Typical}\}.\]

Note that rules of the form \(\varnothing \to X\) (or IF \(\varnothing\) THEN \(X\)) are typically denoted simply by \(X\).

Now, consider an itemset \(\mathcal{C}_n\) with \(n\) members (that is to say, \(n\) attribute/level pairs). In an \(n-\)item rule derived from \(\mathcal{C}\), each of the \(n\) members appears either in the premise or in the conclusion; there are thus \(2^n\) such rules, in principle.

The rule where each member is part of the premise (i.e., the rule without a conclusion) is nonsensical and is not allowed; we can derive exactly \(2^n-1\) \(n-\)item rules from \(\mathcal{C}_n\). Thus, the number of rules increases exponentially when the number of features increases linearly.

This combinatorial explosion is a problem – it instantly disqualifies the brute force approach (simply listing all possible itemsets in the data and generating all rules from those itemsets) for any dataset with a realistic number of attributes.

How can we then generate a small number of promising candidate rules, in general?

11.3.3 The A Priori Algorithm

The a priori algorithm is an early attempt to overcome that difficulty. Initially, it was developed to work for transaction data (i.e. goods as columns, customer purchases as rows), but every reasonable dataset can be transformed into a transaction dataset using dummy variables.

The algorithm attempts to find frequent itemsets from which to build candidate rules, instead of building rules from all possible itemsets.

It starts by identifying frequent individual items in the database and extends those that are retained into larger and larger item supersets, who are themselves retained only if they occur frequently enough in the data.

The main idea is that “all non-empty subsets of a frequent itemset must also be frequent” [207], or equivalently, that all supersets of an infrequent itemset must also be infrequent (see Figure 11.4).

Pruned supersets of an infrequent itemset in the a priori network of a dataset with 5 items.

Figure 11.4: Pruned supersets of an infrequent itemset in the a priori network of a dataset with 5 items [207]; no rule would be generated from the grey itemsets.

In the technical jargon of machine learning, we say that a priori uses a bottom-up approach and the downward closure property of support.

The memory savings arise from the fact that the algorithm prunes candidates with infrequent sub-patterns and removes them from consideration for any future itemset: if a \(1-\)itemset is not considered to be frequent enough, any \(2-\)itemset containing it is also infrequent (see Figure 11.5 for another illustration).

Association rules for NHL playoff teams (1942-1967).

Figure 11.5: Association rules for NHL playoff teams (1942-1967).

A list of the 4 teams making the playoffs each year is shown on the left (\(N=20\)). Frequent itemsets are generated using the a priori algorithms, with a support threshold of 10. We see that there are \(5\) frequent \(1-\)itemsets, top row, in yellow (New York made the playoffs \(6<10\) times – no larger frequent itemset can contain New York). 6 frequent \(2-\)itemsets are found in the subsequent list of ten \(2-\)itemsets, top row, in green (note the absence of New York). Only 2 frequent \(3-\)itemsets are found, top row, in orange. Candidate rules are generated from the shaded itemsets; the rules retained by the thresholds \[\textrm{Support}\geq 0.5,\ \textrm{Confidence}\geq 0.7, \text{ and }\textrm{Lift}>1\ \text{(barely)},\] are shown in the table on the bottom row – the main result is that when Boston made the playoffs, it was not surprising to see Detroit also make the playoffs (the presence or absence of Montreal in a rule is a red herring, as Montreal made the playoffs every year in the data). Are these rules meaningful at all?

Of course, this process requires a support threshold input, for which there there is no guaranteed way to pick a “good” value; it has to be set sufficiently high to minimize the number of frequent itemsets that are being considered, but not so high that it removes too many candidates from the output list; as ever, optimal threshold values are dataset-specific.

The algorithm terminates when no further itemsets extensions are retained, which always occurs given the finite number of levels in categorical datasets.

  • Strengths: easy to implement and to parallelize [208];

  • Limitations: slow, requires frequent data set scans, not ideal for finding rules for infrequent and rare itemsets.

More efficient algorithms have since displaced it in practice (although the a priori algorithm retains historical value):

  • Max-Miner tries to identify frequent itemsets without enumerating them – it performs jumps in itemset space instead of using a bottom-up approach;

  • Eclat is faster and uses depth-first search, but requires extensive memory storage (a priori and eclat are both implemented in the R package arules [202]).

11.3.4 Validation

How reliable are association rules? What is the likelihood that they occur entirely by chance? How relevant are they? Can they be generalised outside the dataset, or to new data streaming in?

These questions are notoriously difficult to solve for association rules discovery, but statistically sound association discovery can help reduce the risk of finding spurious associations to a user-specified significance level [205], [206]. We end this section with a few comments:

  • Since frequent rules correspond to instances that occur repeatedly in the dataset, algorithms that generate itemsets often try to maximize coverage. When rare events are more meaningful (such as detection of a rare disease or a threat), we need algorithms that can generate rare itemsets. This is not a trivial problem.

  • Continuous data has to be binned into categorical data to generate rules. As there are many ways to accomplish that task, the same dataset can give rise to completely different rules. This could create some credibility issues with clients and stakeholders.

  • Other popular algorithms include: AIS, SETM, aprioriTid, aprioriHybrid, PCY, Multistage, Multihash, etc.

  • Additional evaluation metrics can be found in the arules documentation [202].

11.3.5 Case Study: Danish Medical Data

In temporal disease trajectories condensed from population wide registry data covering 6.2 million patients [126], A.B. Jensen et al. study diagnoses in the Danish population, with the help of association rules mining and clustering methods.


Estimating disease progression (trajectories) from current patient state is a crucial notion in medical studies. Such trajectories had (at the time of publication) only been analyzed for a small number of diseases, or using large-scale approaches without consideration for time exceeding a few years. Using data from the Danish National Patient Registry (an extensive, long-term data collection effort by Denmark), the authors sought connections between different diagnoses: how does the presence of a diagnosis at some point in time allow for the prediction of another diagnosis at a later point in time?


The authors took the following methodological steps:

  1. compute the strength of correlation for pairs of diagnoses over a 5 year interval (on a representative subset of the data);

  2. test diagnoses pairs for directionality (one diagnosis repeatedly occurring before the other);

  3. determine reasonable diagnosis trajectories (thoroughfares) by combining smaller (but frequent) trajectories with overlapping diagnoses;

  4. validate the trajectories by comparison with non-Danish data;

  5. cluster the thoroughfares to identify a small number of central medical conditions (key diagnoses) around which disease progression is organized.


The Danish National Patient Registry is an electronic health registry containing administrative information and diagnoses, covering the whole population of Denmark, including private and public hospital visits of all types: inpatient (overnight stay), outpatient (no overnight stay) and emergency. The data set covers 15 years, from January ’96 to November ’10 and consists of 68 million records for 6.2 million patients.

Challenges and Pitfalls

  • Access to the Patient Registry is protected and could only be granted after approval by the Danish Data Registration Agency the National Board of Health.

  • Gender-specific differences in diagnostic trends are clearly identifiable (pregnancy and testicular cancer do not have much cross-appeal), but many diagnoses were found to be made exclusively (or at least, predominantly) in different sites (inpatient, outpatient, emergency ward), which suggests the importance of stratifying by site as well as by gender.

  • In the process of forming small diagnoses chains, it became necessary to compute the correlations using large groups for each pair of diagnoses. For close to 1 million diagnosis pairs, more than 80 million samples would have been required to obtain significant \(p-\)values while compensating for multiple testing, which would have translated to a few thousand years’ worth of computer running time. A pre-filtering step was included to avoid this pitfall.171

Project Summary and Results

The dataset was reduced to 1,171 significant trajectories. These thoroughfares were clustered into patterns centred on 5 key diagnoses central to disease progression:

  • diabetes;

  • chronic obstructive pulmonary disease (COPD);

  • cancer;

  • arthritis, and

  • cerebrovascular disease.

Early diagnoses for these central factors can help reduce the risk of adverse outcome linked to future diagnoses of other conditions.

Two author quotes illustrate the importance of these results:

“The sooner a health risk pattern is identified, the better we can prevent and treat critical diseases.” [S. Brunak]

“Instead of looking at each disease in isolation, you can talk about a complex system with many different interacting factors. By looking at the order in which different diseases appear, you can start to draw patterns and see complex correlations outlining the direction for each individual person.” [L.J. Jensen]

Among the specific results, the following “surprising” insights were found:

  • a diagnosis of anemia is typically followed months later by the discovery of colon cancer;

  • gout was identified as a step on the path toward cardiovascular disease, and

  • COPD is under-diagnosed and under-treated.

The disease trajectories cluster for COPD, for instance, is shown in Figure 11.6.

The COPD cluster showing five preceding diagnoses leading to COPD and some of the possible outcomes.

Figure 11.6: The COPD cluster showing five preceding diagnoses leading to COPD and some of the possible outcomes [126].

11.3.6 Toy Example: Titanic Dataset

Compiled by Robert Dawson in 1995, the Titanic dataset consists of 4 categorical attributes for each of the 2201 people aboard the Titanic when it sank in 1912 (some issues with the dataset have been documented, but we will ignore them for now):

  • class (1st class, 2nd class, 3rd class, crewmember)

  • age (adult, child)

  • sex (male, female)

  • survival (yes, no)

The natural question of interest for this dataset is:

“How does survival relate to the other attributes?”

This is not, strictly speaking, an unsupervised task (as the interesting rules’ structure is fixed to conclusions of the form \(\textrm{survival} = \textrm{Yes}\) or \(\textrm{survival} = \textrm{No}\)).

For the purpose of this example, we elect not to treat the problem as a predictive task, since the situation on the Titanic has little bearing on survival for new data – as such, we use fixed-structure association rules to describe and explore survival conditions on the Titanic (compare with [209]).

We use the arules implementation of the a priori algorithm in R to generate and prune candidate rules, eventually leading to 8 rules (the results are visualized in Figure 11.7). Who survived? Who didn’t?172

Visualization of the 8 *Titanic* association rules with parallel coordinates.

Figure 11.7: Visualization of the 8 Titanic association rules with parallel coordinates.

We show how to obtain these rules via R in Association Rules Mining: Titanic Dataset.


A. B. Jensen et al., “Temporal disease trajectories condensed from population-wide registry data covering 6.2 million patients,” Nature Communications, vol. 5, 2014, doi: 10.1038/ncomms5022.
S. E. Brossette, A. P. Sprague, J. M. Hardin, K. B. Waites, W. T. Jones, and S. A. Moser, Association Rules and Data Mining in Hospital Infection Control and Public Health Surveillance,” Journal of the American Medical Informatics Association, vol. 5, no. 4, pp. 373–381, Jul. 1998, doi: 10.1136/jamia.1998.0050373.
S. Canada, Athlete rebate.”
E. Siegel, Predictive analytics: The power to predict who will click, buy, lie or die. Predictive Analytics World, 2016.
E. Garcia, C. Romero, S. Ventura, and T. Calders, “Drawbacks and solutions of applying association rule mining in learning management systems,” 2007.
Wikipedia, Association rule learning.” 2020.
E. R. Omiecinski, “Alternative interest measures for mining associations in databases,” IEEE Transactions on Knowledge and Data Engineering, vol. 15, no. 1, pp. 57–69, 2003, doi: 10.1109/TKDE.2003.1161582.
G. Piatetsky-Shapiro, “Discovery, analysis, and presentation of strong rules,” 1991.
C. C. Aggarwal and P. S. Yu, “A new framework for itemset generation,” in Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, 1998, pp. 18–24. doi: 10.1145/275487.275490.
P.-N. Tan, V. Kumar, and J. Srivastava, “Selecting the right objective measure for association analysis,” Inf. Syst., vol. 29, no. 4, pp. 293–313, Jun. 2004, doi: 10.1016/S0306-4379(03)00072-3.
M. Hahsler and K. Hornik, New probabilistic interest measures for association rules,” CoRR, vol. abs/0803.0966, 2008.
J. Leskovec, A. Rajamaran, and J. D. Ullman, Mining of Massive Datasets. Cambridge Press, 2014.
M. Risdal, “Exploring survival on the titanic,” Kaggle.com, 2016.