Affinity analysis
View on Wikipedia
Affinity analysis falls under the umbrella term of data mining which uncovers meaningful correlations between different entities according to their co-occurrence in a data set. In almost all systems and processes, the application of affinity analysis can extract significant knowledge about the unexpected trends [citation needed]. In fact, affinity analysis takes advantages of studying attributes that go together which helps uncover the hidden patterns in a big data through generating association rules. Association rules mining procedure is two-fold: first, it finds all frequent attributes in a data set and, then generates association rules satisfying some predefined criteria, support and confidence, to identify the most important relationships in the frequent itemset. The first step in the process is to count the co-occurrence of attributes in the data set. Next, a subset is created called the frequent itemset. The association rules mining takes the form of if a condition or feature (A) is present then another condition or feature (B) exists. The first condition or feature (A) is called antecedent and the latter (B) is known as consequent. This process is repeated until no additional frequent itemsets are found. There are two important metrics for performing the association rules mining technique: support and confidence. Also, a priori algorithm is used to reduce the search space for the problem.[1]
The support metric in the association rule learning algorithm is defined as the frequency of the antecedent or consequent appearing together in a data set. Moreover, confidence is expressed as the reliability of the association rules determined by the ratio of the data records containing both A and B. The minimum threshold for support and confidence are inputs to the model. Considering all the above-mentioned definitions, affinity analysis can develop rules that will predict the occurrence of an event based on the occurrence of other events. This data mining method has been explored in different fields including disease diagnosis, market basket analysis, retail industry, higher education, and financial analysis. In retail, affinity analysis is used to perform market basket analysis, in which retailers seek to understand the purchase behavior of customers. This information can then be used for purposes of cross-selling and up-selling, in addition to influencing sales promotions, loyalty programs, store design, and discount plans.[2]
Application of affinity analysis techniques in retail
[edit]Market basket analysis might tell a retailer that customers often purchase shampoo and conditioner together, so putting both items on promotion at the same time would not create a significant increase in revenue, while a promotion involving just one of the items would likely drive sales of the other.
Market basket analysis may provide the retailer with information to understand the purchase behavior of a buyer. This information will enable the retailer to understand the buyer's needs and rewrite the store's layout accordingly, develop cross-promotional programs, or even capture new buyers (much like the cross-selling concept). An apocryphal early illustrative example for this was when one super market chain discovered in its analysis that male customers that bought diapers often bought beer as well, have put the diapers close to beer coolers, and their sales increased dramatically. Although this urban legend is only an example that professors use to illustrate the concept to students, the explanation of this imaginary phenomenon might be that fathers that are sent out to buy diapers often buy a beer as well, as a reward.[3] This kind of analysis is supposedly an example of the use of data mining. A widely used example of cross selling on the web with market basket analysis is Amazon.com's use of "customers who bought book A also bought book B", e.g. "People who read History of Portugal were also interested in Naval History".
Market basket analysis can be used to divide customers into groups. A company could look at what other items people purchase along with eggs, and classify them as baking a cake (if they are buying eggs along with flour and sugar) or making omelets (if they are buying eggs along with bacon and cheese). This identification could then be used to drive other programs. Similarly, it can be used to divide products into natural groups. A company could look at what products are most frequently sold together and align their category management around these cliques.[4]
Business use of market basket analysis has significantly increased since the introduction of electronic point of sale.[2] Amazon uses affinity analysis for cross-selling when it recommends products to people based on their purchase history and the purchase history of other people who bought the same item. Family Dollar plans to use market basket analysis to help maintain sales growth while moving towards stocking more low-margin consumable goods.[5]
Application of affinity analysis techniques in clinical diagnosis
[edit]
An important clinical application of affinity analysis is that it can be performed on medical patient records in order to generate association rules. The obtained association rules can be further assessed to find different conditions and features that coincide on a large block of information.[6] It is crucial to understand whether there is an association between different factors contributing to a condition to be able to administer the effective preventive or therapeutic interventions. In evidence-based medicine, finding the co-occurrence of symptoms that are associated with developing tumors or cancers can help diagnose the disease at its earliest stage.[7] In addition to exploring the association between different symptoms in a patient related to a specific disease, the possible correlations between various diseases contributing to another condition can also be identified using affinity analysis.[8]
See also
[edit]References
[edit]- ^ Larose, Daniel T.; Larose, Chantal D. (2014-06-23). Discovering Knowledge in Data: An Introduction to Data Mining. Hoboken, NJ, USA: John Wiley & Sons, Inc. doi:10.1002/9781118874059. ISBN 978-1-118-87405-9.
- ^ a b "Demystifying Market Basket Analysi". Retrieved 28 December 2018.
- ^ "The parable of the beer and diapers". The Register. Retrieved 3 September 2009.
- ^ Product Network Analysis Archived 2018-11-18 at the Wayback Machine Forte Consultancy Group
- ^ "Family Dollar Supports Merchandising with IT". Archived from the original on 6 May 2010. Retrieved 3 November 2009.
- ^ Sanida, Theodora; Varlamis, Iraklis (June 2017). "Application of Affinity Analysis Techniques on Diagnosis and Prescription Data". 2017 IEEE 30th International Symposium on Computer-Based Medical Systems (CBMS). Thessaloniki: IEEE. pp. 403–408. doi:10.1109/CBMS.2017.114. ISBN 978-1-5386-1710-6.
- ^ Sengupta, Dipankar; Sood, Meemansa; Vijayvargia, Poorvika; Hota, Sunil; Naik, Pradeep K (29 June 2013). "Association rule mining based study for identification of clinical parameters akin to occurrence of brain tumor". Bioinformation. 9 (11): 555–559. doi:10.6026/97320630009555. PMC 3717182. PMID 23888095.
- ^ Lakshmi, K.S; Vadivu, G. (2017). "Extracting Association Rules from Medical Health Records Using Multi-Criteria Decision Analysis". Procedia Computer Science. 115: 290–295. doi:10.1016/j.procs.2017.09.137.
Further reading
[edit]- J. Han et al., 2006, Data Mining: Concepts and Techniques ISBN 978-1-55860-901-3
- V. Kumar et al. 2005 Introduction to Data Mining ISBN 978-0-321-32136-7
- U. Fayyad et al. 1996 Advances in Knowledge Discovery and Data Mining ISBN 978-0-262-56097-9
External links
[edit]Affinity analysis
View on GrokipediaFundamentals
Definition and Core Concepts
Affinity analysis is a data mining technique used to discover co-occurrence patterns and associations among items or events in large datasets, particularly focusing on relationships between seemingly unrelated entities. It involves identifying how certain items tend to appear together in transactions, revealing hidden affinities without implying causality. This approach is commonly applied in scenarios like market basket analysis, where the goal is to uncover patterns such as the frequent joint purchase of products that might not intuitively seem related.[5] At its core, affinity analysis operates on the concepts of transactions and itemsets. A transaction represents a set of items that occur together, such as the contents of a customer's shopping basket or a sequence of events in a log. An itemset is a collection of one or more items from these transactions, ranging from single items (1-itemsets) to larger combinations (k-itemsets). The primary objective is to detect frequent itemsets and derive associations, such as determining that items A and B often co-occur, which can inform decisions like product placement or recommendation systems. These concepts enable the exploration of relational patterns in transactional data, emphasizing the strength of affinities based on joint occurrences rather than individual item popularity.[5] To illustrate, consider a simple dataset of customer purchases represented as transactions:| Transaction ID | Items Purchased |
|---|---|
| 1 | Bread, Milk |
| 2 | Bread, Diaper, Beer, Eggs |
| 3 | Milk, Diaper, Beer, Coke |
| 4 | Bread, Milk, Diaper, Beer |
| 5 | Bread, Milk, Diaper, Coke |
Historical Development
While statistical measures of association, such as chi-square tests developed by Karl Pearson in 1900, have long been used in market research to analyze relationships in survey data, the formal development of affinity analysis as a data mining technique for discovering co-occurrence patterns in transactional data began in the 1990s. The field gained prominence in the 1990s as data mining emerged, propelled by the explosion of large-scale databases in retail and e-commerce. A pivotal milestone occurred in 1993, when Rakesh Agrawal, Tomasz Imielinski, and Arun Swami introduced the concept of mining association rules for market basket analysis, enabling the discovery of frequent co-occurrences among items in transactional data.[7] This work formalized affinity analysis as a core data mining task, shifting focus from ad hoc statistical tests to systematic pattern extraction. Building on this foundation, Rakesh Agrawal and Ramakrishnan Srikant advanced the field in 1994 with efficient algorithms for generating association rules, establishing them as pioneers of the domain. In the post-2000 big data era, affinity analysis evolved from labor-intensive statistical approaches to scalable automated methods, incorporating distributed computing frameworks like Hadoop and Spark to process massive datasets.[8] This integration with machine learning further enhanced its utility, embedding association rules into predictive models for dynamic applications such as personalized recommendations.[9]Techniques and Methods
Association Rule Learning
Association rule learning is a foundational technique in affinity analysis for discovering relationships between variables in large datasets, typically represented as transactions containing items. The process begins with identifying frequent itemsets—subsets of items that appear together in a dataset with a frequency exceeding a specified minimum support threshold—and then derives association rules from these itemsets in the form of antecedent → consequent, where the antecedent implies the consequent based on observed co-occurrences.[10] This two-phase approach ensures that only patterns meeting the support criterion are considered for rule generation, focusing computational efforts on statistically significant associations.[10] The Apriori algorithm, introduced in 1994, serves as the seminal method for this process, employing a breadth-first search strategy to iteratively build frequent itemsets level by level, starting from single items. It generates candidate itemsets of size k+1 by joining frequent itemsets of size k that share a common prefix, then prunes those candidates that contain any subset failing the minimum support threshold, leveraging the apriori property that all subsets of a frequent itemset must also be frequent.[10] During each iteration, the algorithm scans the database to count the support of these candidates, retaining only those above the threshold as frequent itemsets for the next level.[10] Once frequent itemsets are found, strong association rules are extracted by partitioning each itemset into antecedent and consequent subsets and evaluating their confidence, though detailed metric computations occur separately.[10] An efficient alternative to Apriori is the FP-growth algorithm, proposed in 2000, which avoids explicit candidate generation by constructing a compact Frequent Pattern tree (FP-tree) data structure from the compressed dataset. The FP-tree captures the complete information of frequent patterns through a prefix tree where each path from root to leaf represents a frequent prefix, with nodes linked via header tables for efficient traversal.[11] Mining proceeds by recursively dividing the FP-tree based on conditional patterns, growing frequent itemsets directly from the tree without repeated database scans, often outperforming Apriori on dense datasets by reducing I/O overhead.[11] To illustrate, consider a small transaction dataset with minimum support of 50% (2 out of 4 transactions) and items A (apple), B (banana), C (cereal), D (diaper), M (milk):- Transactions: T1: {A, B, M}, T2: {B, C, M}, T3: {A, B, D}, T4: {M, D}.