Recent from talks
Nothing was collected or created yet.
Multiple factor analysis
View on WikipediaMultiple factor analysis (MFA) is a factorial method[1] devoted to the study of tables in which a group of individuals is described by a set of variables (quantitative and / or qualitative) structured in groups. It is a multivariate method from the field of ordination used to simplify multidimensional data structures. MFA treats all involved tables in the same way (symmetrical analysis). It may be seen as an extension of:
- Principal component analysis (PCA) when variables are quantitative,
- Multiple correspondence analysis (MCA) when variables are qualitative,
- Factor analysis of mixed data (FAMD) when the active variables belong to the two types.
Introductory example
[edit]Why introduce several active groups of variables in the same factorial analysis?
data
Consider the case of quantitative variables, that is to say, within the framework of the PCA. An example of data from ecological research provides a useful illustration. There are, for 72 stations, two types of measurements:
- The abundance-dominance coefficient of 50 plant species (coefficient ranging from 0 = the plant is absent, to 9 = the species covers more than three-quarters of the surface). The whole set of the 50 coefficients defines the floristic profile of a station.
- Eleven pedological measurements (Pedology = soil science): particle size, physical, chemistry, etc. The set of these eleven measures defines the pedological profile of a station.
Three analyses are possible:
- PCA of flora (pedology as supplementary): this analysis focuses on the variability of the floristic profiles. Two stations are close one another if they have similar floristic profiles. In a second step, the main dimensions of this variability (i.e. the principal components) are related to the pedological variables introduced as supplementary.
- PCA of pedology (flora as supplementary): this analysis focuses on the variability of soil profiles. Two stations are close if they have the same soil profile. The main dimensions of this variability (i.e. the principal components) are then related to the abundance of plants.
- PCA of the two groups of variables as active: one may want to study the variability of stations from both the point of view of flora and soil. In this approach, two stations should be close if they have both similar flora 'and' similar soils.
Balance between groups of variables
[edit]Methodology
[edit]The third analysis of the introductory example implicitly assumes a balance between flora and soil. However, in this example, the mere fact that the flora is represented by 50 variables and the soil by 11 variables implies that the PCA with 61 active variables will be influenced mainly by the flora at least on the first axis). This is not desirable: there is no reason to wish one group play a more important role in the analysis.
The core of MFA is based on a factorial analysis (PCA in the case of quantitative variables, MCA in the case of qualitative variables) in which the variables are weighted. These weights are identical for the variables of the same group (and vary from one group to another). They are such that the maximum axial inertia of a group is equal to 1: in other words, by applying the PCA (or, where applicable, the MCA) to one group with this weighting, we obtain a first eigenvalue equal to 1. To get this property, MFA assigns to each variable of group a weight equal to the inverse of the first eigenvalue of the analysis (PCA or MCA according to the type of variable) of the group .
Formally, noting the first eigenvalue of the factorial analysis of one group , the MFA assigns weight for each variable of the group .
Balancing maximum axial inertia rather than the total inertia (= the number of variables in standard PCA) gives the MFA several important properties for the user. More directly, its interest appears in the following example.
Example
[edit]Let two groups of variables defined on the same set of individuals.
- Group 1 is composed of two uncorrelated variables A and B.
- Group 2 is composed of two variables {C1, C2} identical to the same variable C uncorrelated with the first two.
This example is not completely unrealistic. It is often necessary to simultaneously analyse multi-dimensional and (quite) one-dimensional groups.
Each group having the same number of variables has the same total inertia.
In this example the first axis of the PCA is almost coincident with C. Indeed, in the space of variables, there are two variables in the direction of C: group 2, with all its inertia concentrated in one direction, influences predominantly the first axis. For its part, group 1, consisting of two orthogonal variables (= uncorrelated), has its inertia uniformly distributed in a plane (the plane generated by the two variables) and hardly weighs on the first axis.
Numerical Example
|
|
Table 2 summarizes the inertia of the first two axes of the PCA and of the MFA applied to Table 1.
Group 2 variables contribute to 88.95% of the inertia of the axis 1 of the PCA. The first axis () is almost coincident with C: the correlation between C and is .976;
The first axis of the MFA (on Table 1 data) shows the balance between the two groups of variables: the contribution of each group to the inertia of this axis is strictly equal to 50%.
The second axis, meanwhile, depends only on group 1. This is natural since this group is two-dimensional while the second group, being one-dimensional, can be highly related to only one axis (here the first axis).
Conclusion about the balance between groups
[edit]Introducing several active groups of variables in a factorial analysis implicitly assumes a balance between these groups.
This balance must take into account that a multidimensional group influences naturally more axes than a one-dimensional group does (which may not be closely related to one axis).
The weighting of the MFA, which makes the maximum axial inertia of each group equal to 1, plays this role.
Application examples
[edit]Survey Questionnaires are always structured according to different themes. Each theme is a group of variables, for example, questions about opinions and questions about behaviour. Thus, in this example, we may want to perform a factorial analysis in which two individuals are close if they have both expressed the same opinions and the same behaviour.
Sensory analysis A same set of products has been evaluated by a panel of experts and a panel of consumers. For its evaluation, each jury uses a list of descriptors (sour, bitter, etc.). Each judge scores each descriptor for each product on a scale of intensity ranging for example from 0 = null or very low to 10 = very strong. In the table associated with a jury, at the intersection of the row and column , is the average score assigned to product for descriptor .
Individuals are the products. Each jury is a group of variables. We want to achieve a factorial analysis in which two products are similar if they were evaluated in the same way by both juries.
Multidimensional time series variables are measured on individuals. These measurements are made at dates. There are many ways to analyse such data set. One way suggested by MFA is to consider each day as a group of variables in the analysis of the tables (each table corresponds to one date) juxtaposed row-wise (the table analysed thus has rows and x columns).
Conclusion: These examples show that in practice, variables are very often organized into groups.
Graphics from MFA
[edit]Beyond the weighting of variables, interest in MFA lies in a series of graphics and indicators valuable in the analysis of a table whose columns are organized into groups.
Graphics common to all the simple factorial analyses (PCA, MCA)
[edit]The core of MFA is a weighted factorial analysis: MFA firstly provides the classical results of the factorial analyses.
1. Representations of individuals in which two individuals are close to each other if they exhibit similar values for many variables in the different variable groups; in practice the user particularly studies the first factorial plane.
2.Representations of quantitative variables as in PCA (correlation circle).
![]() |
![]() |
In the example:
- The first axis mainly opposes individuals 1 and 5 (Figure 1).
- The four variables have a positive coordinate (Figure 2): the first axis is a size effect. Thus, individual 1 has low values for all the variables and individual 5 has high values for all the variables.
3. Indicators aiding interpretation: projected inertia, contributions and quality of representation. In the example, the contribution of individuals 1 and 5 to the inertia of the first axis is 45.7% + 31.5% = 77.2% which justifies the interpretation focussed on these two points.
4. Representations of categories of qualitative variables as in MCA (a category lies at the centroid of the individuals who possess it). No qualitative variables in the example.
Graphics specific to this kind of multiple table
[edit]5. Superimposed representations of individuals « seen » by each group. An individual considered from the point of view of a single group is called partial individual (in parallel, an individual considered from the point of view of all variables is said mean individual because it lies at the center of gravity of its partial points). Partial cloud gathers the individuals from the perspective of the single group (ie ): that is the cloud analysed in the separate factorial analysis (PCA or MCA) of the group . The superimposed representation of the provided by the MFA is similar in its purpose to that provided by the Procrustes analysis.

In the example (figure 3), individual 1 is characterized by a small size (i.e. small values) both in terms of group 1 and group 2 (partial points of the individual 1 have a negative coordinate and are close one another). On the contrary, the individual 5 is more characterized by high values for the variables of group 2 than for the variables of group 1 (for the individual 5, group 2 partial point lies further from the origin than group 1 partial point). This reading of the graph can be checked directly in the data.
6. Representations of groups of variables as such. In these graphs, each group of variables is represented by a single point. Two groups of variables are close one another when they define the same structure on individuals. Extreme case: two groups of variables that define homothetic clouds of individuals coincide. The coordinate of group along the axis is equal to the contribution of the group to the inertia of MFA dimension of rank . This contribution can be interpreted as an indicator of relationship (between the group and the axis , hence the name relationship square given to this type of representation). This representation also exists in other factorial methods (MCA and FAMD in particular) in which case the groups of variable are each reduced to a single variable.

In the example (Figure 4), this representation shows that the first axis is related to the two groups of variables, while the second axis is related to the first group. This agrees with the representation of the variables (figure 2). In practice, this representation is especially precious when the groups are numerous and include many variables.
Other reading grid. The two groups of variables have in common the size effect (first axis) and differ according to axis 2 since this axis is specific to group 1 (he opposes the variables A and B).
7. Representations of factors of separate analyses of the different groups. These factors are represented as supplementary quantitative variables (correlation circle).

In the example (figure 5), the first axis of the MFA is relatively strongly correlated (r = .80) to the first component of the group 2. This group, consisting of two identical variables, possesses only one principal component (confounded with the variable). The group 1 consists of two orthogonal variables: any direction of the subspace generated by these two variables has the same inertia (equal to 1). So there is uncertainty in the choice of principal components and there is no reason to be interested in one of them in particular. However, the two components provided by the program are well represented: the plane of the MFA is close to the plane spanned by the two variables of group 1.
Conclusion
[edit]The numerical example illustrates the output of the MFA. Besides balancing groups of variables and besides usual graphics of PCA (of MCA in the case of qualitative variables), the MFA provides results specific of the group structure of the set of variables, that is, in particular:
- A superimposed representation of partial individuals for a detailed analysis of the data;
- A representation of groups of variables providing a synthetic image more and more valuable as that data include many groups;
- A representation of factors from separate analyses.
The small size and simplicity of the example allow simple validation of the rules of interpretation. But the method will be more valuable when the data set is large and complex. Other methods suitable for this type of data are available. Procrustes analysis is compared to the MFA in.[2]
History
[edit]MFA was developed by Brigitte Escofier and Jérôme Pagès in the 1980s. It is at the heart of two books written by these authors:[3] and.[4] The MFA and its extensions (hierarchical MFA, MFA on contingency tables, etc.) are a research topic of applied mathematics laboratory Agrocampus (LMA ²) which published a book presenting basic methods of exploratory multivariate analysis.[5]
Software
[edit]MFA is available in two R packages (FactoMineR and ADE4) and in many software packages, including SPAD, Uniwin, XLSTAT, etc. There is also a function SAS[permanent dead link] . The graphs in this article come from the R package FactoMineR.
References
[edit]- ^ Greenacre, Michael; Blasius, Jorg (2006-06-23). Multiple Correspondence Analysis and Related Methods. CRC Press. pp. 352–. ISBN 9781420011319. Retrieved 11 June 2014.
- ^ Pagès Jérôme (2014). Multiple Factor Analysis by Example Using R. Chapman & Hall/CRC The R Series, London. 272p
- ^ Ibidem
- ^ Escofier Brigitte & Pagès Jérôme (2008). Analyses factorielles simples et multiples; objectifs, méthodes et interprétation. Dunod, Paris. 318 p. ISBN 978-2-10-051932-3
- ^ Husson F., Lê S. & Pagès J. (2009). Exploratory Multivariate Analysis by Example Using R. Chapman & Hall/CRC The R Series, London. ISBN 978-2-7535-0938-2
External links
[edit]- FactoMineR A R software devoted to exploratory data analysis.
Multiple factor analysis
View on GrokipediaIntroduction
Definition and Objectives
Multiple factor analysis (MFA) is a principal component method designed for the simultaneous analysis of multiple groups of variables, which can be numerical and/or categorical, measured on the same set of observations.90135-X) It aims to identify common underlying structures across these groups while evaluating the balance or relative contributions of each group to the overall analysis. The primary objectives of MFA include summarizing complex multi-table data into a lower-dimensional representation, detecting redundancies or complementarities between variable groups, and achieving a unified dimensionality reduction that accounts for the individual inertias of each group. By normalizing and weighting the groups appropriately, MFA facilitates the exploration of shared patterns without one group dominating the results due to scale differences.90135-X) This approach builds on principal component analysis (PCA) for continuous variables and multiple correspondence analysis (MCA) for categorical ones, adapting their principles to multi-group settings. Key benefits of MFA lie in its ability to handle mixed data types without requiring homogeneity across groups, enabling direct comparisons of the importance of different variable sets, and supporting exploratory analyses in diverse fields such as sensory evaluation and multi-omics studies. In sensory evaluation, for instance, it allows integration of assessor ratings and physicochemical measurements to assess product perceptions holistically. Similarly, in multi-omics research, MFA integrates datasets like genomics and proteomics to uncover coordinated biological variations. The main outputs of MFA consist of a global factor map representing the compromise across all groups, partial factor maps illustrating each group's specific structure projected onto the global axes, and balance indicators that quantify the contributions and inertias of individual groups. These visualizations and metrics provide insights into both the consensus and discrepancies among the data tables.90135-X)Relation to Other Factorial Methods
Multiple factor analysis (MFA) extends principal component analysis (PCA) to the analysis of multiple data tables describing the same set of observations, addressing the limitations of standard PCA when applied to single-block data by incorporating a normalization step for each group to ensure balanced contributions. In PCA, the focus is on maximizing variance within a single table through eigenvalue decomposition, whereas MFA first performs a PCA on each individual group (or block) of variables, scales the data by dividing by the square root of the first eigenvalue of that group's PCA to normalize inertia, and then concatenates the normalized tables for a global PCA. This adaptation prevents any single group with larger variance from dominating the analysis, allowing for a joint representation that respects the structure of heterogeneous data sets.[6] For groups involving categorical variables, MFA integrates principles from multiple correspondence analysis (MCA) by treating such data as contingency tables and adjusting for category frequencies to align the scaling with continuous variables, effectively performing MCA within each qualitative group before normalization. Unlike standalone MCA, which analyzes categorical data using chi-squared distances to handle the double contingency table inherent in multiple categories, MFA embeds this within the multi-group framework, representing categories by their centers of gravity rather than disjunctive coding alone, which facilitates integration with quantitative groups without distorting the overall factor space. This hybrid approach ensures that categorical and continuous variables contribute comparably to the global factors after normalization by the first singular value of the group's MCA.[6][7] MFA distinguishes itself from other multi-table methods, such as STATIS, which seeks a compromise between tables by optimizing weights to maximize the similarity of observation factor scores across groups via RV coefficients, whereas MFA employs a fixed normalization scheme to promote balance without iterative weighting. In contrast to multi-block PCA variants like SUM-PCA, which concatenate blocks after simple variance standardization and may allow dominant blocks to overshadow others, MFA's inertia-based normalization and emphasis on eigenvalue ratios—comparing each group's first eigenvalue to the global principal component—explicitly assess and enforce equilibrium across groups, making it particularly suited for mixed data types. These differences position MFA as a balanced extension of factorial methods for multi-block settings, assuming prior knowledge of PCA's variance maximization and MCA's distance metrics.[6]Data Structure and Preparation
Organization of Multiple Variable Groups
Multiple factor analysis (MFA) requires a multi-table data structure where a set of I observations is described by K distinct groups of variables, with the k-th group comprising J_k variables organized as an I × J_k matrix.[8] These matrices are conceptually concatenated horizontally to form a global I × ∑J_k data set, though each group is analyzed separately in the initial stages to account for its internal structure.[4] For instance, in sensory analysis of food products, one group might include physical attributes like pH and density, while another covers sensory attributes such as sweetness and bitterness.[4] A fundamental requirement is that the same I observations must appear across all K groups, ensuring comparability and alignment in the analysis.[8] Missing values can be handled in standard MFA: for numerical variables, often imputed with column means; for categorical variables, treated as an additional category or coded as absent in the disjunctive table. Complete data is ideal for accuracy, and advanced imputation methods are available for complex cases.[9] Groups must also be conceptually distinct, representing different aspects or domains of the observations (e.g., quantitative measurements versus qualitative descriptors), to facilitate the identification of shared and unique patterns.[8] Preprocessing begins with centering all numerical variables within each group by subtracting the group-specific mean, which removes location effects and focuses on variance.[4] For categorical variables, they are transformed into disjunctive tables or indicator matrices, where each category becomes a binary column (1 if present, 0 otherwise), enabling factorial treatment akin to multiple correspondence analysis (MCA).[4] If variables within a group exhibit differing scales, they may be scaled to unit variance prior to analysis to ensure equitable contribution during group factorization.[8] When defining groups, practitioners should aim for a relatively balanced number of variables (J_k) across the K groups to prevent any single group from disproportionately influencing the global structure, although subsequent normalization steps in MFA mitigate imbalances.[8] Each group is typically analyzed using principal component analysis (PCA) for quantitative variables or MCA for categorical ones, providing the foundation for integration.[4]Handling Different Variable Types
In Multiple Factor Analysis (MFA), data are organized into groups of variables, where the treatment of variable types within each group is essential for equitable contribution to the overall analysis. Numerical variables are standardized by centering them to a mean of zero and scaling to unit variance, enabling the application of Principal Component Analysis (PCA) that relies on Euclidean distances to summarize the group's variability. This standardization ensures that all numerical variables have comparable scales, preventing any single variable from dominating the group's principal components.[7] Categorical variables require transformation into a disjunctive table, consisting of indicator columns for each category, to facilitate analysis. Each indicator column is weighted by the proportion of individuals who do not possess that category (1 - f_i, where f_i is the frequency of the category), and Multiple Correspondence Analysis (MCA) is then applied using chi-squared distances, which account for the relative frequencies and capture associations among categories. This approach balances the influence of categories with varying prevalences, aligning the categorical group's inertia with that of numerical groups.[7] Groups with mixed variable types are rare in standard MFA, as the method assumes homogeneity within groups to apply appropriate metric spaces consistently. In such cases, variables are often separated into homogeneous sub-groups for separate PCA or MCA before integration, or extensions incorporate hybrid distances that combine Euclidean for numerical and chi-squared for categorical components; this is particularly noted in applications to multi-omics data, where diverse data modalities like continuous expression levels and discrete mutations necessitate adaptive handling.[10] Ordinal variables pose type-specific challenges, as their ordered categories can be treated either as categorical—via disjunctive coding and MCA to respect discrete levels—or as numerical if the scale is sufficiently granular to approximate continuous data, allowing standardization and PCA. The decision hinges on the number of levels and the meaningfulness of intervals, ensuring the treatment aligns with the variable's measurement properties for compatibility in the global MFA framework.[11] To validate the setup, groups should contain an adequate number of variables, such as more than five per group, to promote stability in the extracted factors and reliable estimation of group-specific inertias. Smaller groups risk unstable principal components and inflated variability in balance metrics.[8]Core Methodology
Group Normalization and Weighting
In Multiple Factor Analysis (MFA), the process of group normalization and weighting begins with the separate analysis of each group of variables to ensure equitable contributions across diverse data sets. For numerical variable groups, Principal Component Analysis (PCA) is performed on the centered and scaled data matrix , while for categorical groups, Multiple Correspondence Analysis (MCA) is applied, yielding the first eigenvalue for each group . This initial step captures the internal structure of each group independently, with representing the maximum variance (or inertia) explained by the first principal component.[7][8] Normalization follows to balance the influence of groups that may differ in size or variability. Specifically, the data matrix for group is divided by the square root of the first eigenvalue , producing the normalized matrix . This adjustment equalizes the maximum variance across groups, as the first dimension of each normalized group now accounts for a unit variance. The rationale for this weighting is to prevent larger groups—those with more variables or higher overall inertia—from dominating the subsequent global analysis, thereby promoting a fair comparison of typologies or structures within each group.[7][8][11] The output of this normalization step consists of normalized partial factor maps for each group, where the coordinates in rescale the original principal components to a common scale. These maps preserve the relative positions within each group while mitigating scale disparities, preparing the data for integration into a unified framework. By design, this approach ensures that no single group can unilaterally define the primary axes of variation in the overall analysis.[7][8]Global Data Set Construction
In multiple factor analysis (MFA), the global data set is assembled by horizontally concatenating the normalized matrices from each group of variables, enabling a unified principal component analysis (PCA) across all groups. Specifically, for K groups, each normalized matrix (of dimensions , where is the number of observations and the number of variables in group ) is bound side-by-side to form the global matrix , resulting in an matrix. This structure preserves the block-wise organization while allowing the extraction of compromise factors that balance contributions from all groups. The normalization of each , typically by dividing the original group matrix by its first singular value (where is the first eigenvalue from a preliminary PCA or MCA on group ), ensures that no single group dominates due to scale differences.[7][8] The primary purpose of this global construction is to perform PCA on , yielding factors that equally represent all groups post-normalization and reveal shared structures across variable sets while highlighting discrepancies. Unequal group sizes are implicitly addressed through the normalization step, as scaling by the first singular value equalizes the inertia of each group along the first dimension; however, when groups differ vastly in variable counts (e.g., one with 5 variables versus another with 50), this may introduce subtle biases toward larger groups, prompting extensions like explicit group weighting in advanced implementations.Factor Extraction and Coordinates
Once the global data set is constructed by concatenating the normalized group data tables , multiple factor analysis proceeds with a principal component analysis (PCA) applied to this aggregated matrix.[6] The PCA extracts the principal factors by decomposing the covariance structure of , yielding eigenvalues that quantify the variance explained by each successive factor , along with the global principal coordinates for the observations and the loadings for the variables.[6] These global coordinates represent the positions of observations in the compromise space, which synthesizes information across all groups while respecting their individual structures.[4] The partial coordinates for each group on factor are then derived by projecting the group's normalized matrix onto the global eigenvectors from the PCA of , given by the formula: This projection captures how each group's variables contribute to the global factors without altering the overall compromise.[6] The resulting partial coordinates allow for group-specific interpretations within the shared factor space. The number of factors to retain is typically determined using criteria such as the scree plot of eigenvalues or the cumulative percentage of inertia explained, often selecting 2 to 5 dimensions for practical interpretability in applications like sensory analysis.[4] Total inertia in MFA is computed as , where denotes the number of observations, providing a measure of the overall variance across the balanced groups.[6] Mathematically, this global PCA maximizes the explained variance across all normalized groups simultaneously, ensuring that no single group dominates the factor structure due to prior balancing.[4] The eigenvalues thus reflect the inertia along each principal axis in this unified space, with the sum of retained indicating the proportion of total inertia captured.[6]Balance Analysis
Metrics for Group Contributions
In multiple factor analysis, several metrics quantify the contributions of individual variable groups to the global factors, enabling researchers to assess relative importance, alignment, and potential imbalances post-extraction. The first eigenvalue ratio for group , denoted , measures the group's relative importance prior to normalization, where is the first eigenvalue from the separate principal component analysis (or multiple correspondence analysis) of group , and the denominator sums these values across all groups; higher ratios indicate groups with stronger inherent structure that could dominate the analysis without adjustment.[7] The contribution of group to the inertia of global factor is captured by , where the numerator sums the variances (or squared coordinates) from the partial factor map of group on dimension , and is the eigenvalue of the global factor; this proportion reveals how much each group supports the explanation of overall data variance along specific dimensions, with larger values highlighting influential groups.[8] Coordinates quality for group on factor is evaluated using , which expresses the fraction of the group's total variance (as given by its first eigenvalue ) explained by the global factor; values approaching 1 denote excellent fit, meaning the global structure effectively captures the group's variability, while lower values suggest misalignment.[12] For imbalance detection, the average across the initial factors (typically the first two or three) is computed for each group; persistently low averages, such as below 0.3, indicate inadequate representation and potential imbalance, where the group's structure deviates substantially from the global factors and may warrant further scrutiny or preprocessing.[12]Interpreting Balance Across Groups
In multiple factor analysis (MFA), balance across groups is interpreted by evaluating the uniformity of the group inertias and the squared correlations between each group's principal components and the global factors. High uniformity in values across groups, combined with for most groups on the primary dimensions, indicates that the groups capture similar aspects of the underlying global data structure, suggesting a harmonious integration without any single group overly influencing the analysis.[11] Disparities in these metrics, such as varying levels, signal potential imbalances that may warrant remedial steps like removing outlier groups or adjusting weights to equalize their contributions.[7] Decision rules for addressing imbalances rely on thresholds for these metrics to guide analytical choices. For instance, if the for one group exceeds 0.5 on a given dimension, that group is considered dominant and may skew the global solution, prompting separate PCA analyses for that group or exclusion from the MFA to avoid distortion.[11] Pairwise similarities between groups can be further assessed using the RV coefficient, which ranges from 0 (no structural similarity) to 1 (perfect homothety); values above 0.8 typically indicate strong alignment, while lower values suggest divergent information that could justify subgrouping or hierarchical extensions.[13] The implications of balanced versus imbalanced MFA outcomes provide key insights into the data's underlying patterns. A well-balanced analysis reveals shared structures across groups, facilitating the identification of common factors that generalize across variable sets, such as consensus in sensory evaluations from multiple experts.[8] In contrast, imbalance highlights unique aspects within specific groups, allowing researchers to isolate group-specific variances that might otherwise be masked; this is particularly useful in exploratory studies where group disparities inform targeted follow-up analyses.[11] To address persistent imbalances, especially in hierarchically structured data, hierarchical MFA extends the method by balancing contributions at multiple levels, offering a more nuanced remedial approach than standard weighting.[11] Despite their utility, balance metrics in MFA have notable limitations that affect interpretation. These metrics inherently assume equal relevance of all groups to the global structure, which may not hold if some groups are conceptually peripheral, leading to over- or under-emphasis.[7] Additionally, they are sensitive to imbalances in the number of variables per group, as larger groups can artificially inflate their first eigenvalues and thus their weights, potentially biasing the overall balance assessment even after normalization.[11]Visualization and Interpretation
Standard Factorial Graphics
In multiple factor analysis (MFA), standard factorial graphics provide visualizations of the global principal components derived from the concatenated and normalized data sets, enabling an overview of the overall structure across all variable groups. These plots adapt classical principal component analysis (PCA) and multiple correspondence analysis (MCA) techniques to the MFA framework, where observations are projected onto the global factor space to reveal patterns of similarity and variable contributions without emphasizing group-specific differences.[14] The global factor map, often presented as a biplot, displays observations and variable loadings simultaneously on the first two global principal components, illustrating the primary axes of variation in the combined data. Observations are positioned according to their coordinates in this global space, while arrows or points represent the loadings of variables from all groups, typically color-coded by their originating group to distinguish contributions visually. This graphic highlights clusters of similar observations and the directions in which variable groups pull the structure, with the length of arrows indicating the magnitude of influence on the factors. For instance, in applications involving mixed variable types, quantitative variables appear as vectors, and categorical modalities as points, all scaled to the global eigenvalues.[15] A scree plot visualizes the eigenvalues associated with the global principal components, plotted against component number, to assess the dimensionality of the solution and the proportion of total inertia explained by each factor. In MFA, this plot often includes both global eigenvalues and a comparison to partial inertias from individual group analyses, aiding in the decision of how many dimensions to retain for interpretation—typically those where the eigenvalue curve begins to flatten. The cumulative variance explained is marked, with the first few components often accounting for a substantial portion, such as over 60% in balanced data sets. Individual factor maps extend the global view by projecting observations onto a single principal component or a specific pair beyond the first two, allowing deeper inspection of variance along isolated dimensions. These maps position observations based on their global coordinates for the selected factor, often supplemented with confidence ellipses or color gradients based on squared correlations (cos²) to indicate how well individuals are represented. Such plots are useful for identifying outliers or subtle patterns not evident in the primary biplot.[15] Correlation circles, akin to those in PCA, depict the correlations between variables (or modalities) and the global principal components, plotted on a unit circle to show angular relationships and strengths. In MFA, variables from different groups are included and color-coded accordingly, revealing how each group's elements align with or oppose the factors—for example, variables with correlations near 1 lying close to the axis. This graphic underscores the quality of variable representation, with points nearer the circle periphery indicating stronger associations with the component.Unique MFA Visualizations
Multiple factor analysis (MFA) employs several specialized visualizations to evaluate the alignment and balance among variable groups, going beyond standard factorial plots by highlighting group-specific contributions and inter-group relationships. Partial factor maps are superimposed representations that project each group's variables or individuals onto the global principal axes, often using transparency, color coding, or distinct symbols to reveal overlaps and discrepancies in how different groups structure the data. For instance, in sensory analysis applications, these maps allow researchers to compare the configuration of chemical attributes against sensory perceptions, identifying whether groups capture similar patterns in the observations.[4] This visualization aids in assessing group balance by quantifying the proximity of partial points to the global compromise, where closer alignments indicate harmonious contributions across datasets.[1] Group contribution bar plots further illuminate imbalances by displaying metrics such as the eigenvalue-based group weights (the contribution of group to dimension ) and the average squared cosine (measuring the quality of representation of group on dimension ) across principal components. These horizontal or vertical bar charts, typically ordered by magnitude, highlight dominant groups on specific axes; for example, a group with high values suggests it disproportionately influences the global solution, potentially signaling the need for reweighting. Such plots are essential for detecting redundancies or underrepresentations, as low values imply poor alignment with the overall factors.[6] Recent implementations extend these to interactive formats, enhancing interpretability in complex multiblock studies.[16] The between-group RV matrix, visualized as a heatmap, computes pairwise RV coefficients—a generalization of the squared correlation for matrices—to quantify structural similarities between groups, with values ranging from 0 (no similarity) to 1 (identical structure). In the heatmap, rows and columns represent groups, and color intensity (e.g., from blue for low to red for high) reveals complementary or redundant datasets; for example, high RV values between sensory and instrumental groups indicate convergent information. This tool is particularly useful for identifying clusters of aligned groups, aiding decisions on data fusion.[11] Additionally, dendrograms derived from hierarchical clustering on the RV matrix facilitate group clustering, where branches represent similarity levels, helping to organize groups into hierarchical structures of complementarity or redundancy. These advancements, integrated into modern software, enhance the analysis of group balances in diverse applications like bioinformatics.[16]Examples and Applications
Introductory Worked Example
To illustrate the principles of multiple factor analysis (MFA), consider a hypothetical dataset on 20 wines, where each wine is described by three distinct groups of variables: chemical properties (numerical variables such as pH, alcohol content, and residual sugar), sensory attributes (numerical scores for aroma intensity, body, and aftertaste on a 1-10 scale), and tasting notes (categorical variables classifying dominant flavors as fruity, oaky, or spicy). This setup allows MFA to integrate diverse data types while assessing their balanced contributions to a global structure.[2] The analysis proceeds in steps, beginning with separate analyses of each group to normalize their scales. For the numerical groups (chemical and sensory), principal component analysis (PCA) is applied; for the categorical tasting notes group, multiple correspondence analysis (MCA) is used to handle the qualitative data. The first eigenvalues from these separate analyses quantify each group's internal structure: λ₁ = 5.2 for the chemical group, λ₁ = 3.1 for the sensory group, and λ₁ = 2.8 for the tasting notes group. To ensure comparability, each group's data matrix is scaled by dividing by the square root of its respective λ₁, effectively normalizing the first eigenvalue to 1 across groups and preventing any single group from dominating due to scale differences.[2][1] The normalized matrices are then concatenated column-wise to form a global dataset, on which a single PCA is performed to extract common factors. The eigenvalues from this global PCA are summarized below, showing that the first two factors account for 60% of the total variance, providing a compact representation of the wines' shared patterns.| Factor | Eigenvalue | Variance Explained (%) | Cumulative Variance (%) |
|---|---|---|---|
| 1 | 4.20 | 35.0 | 35.0 |
| 2 | 2.90 | 25.0 | 60.0 |
| 3 | 1.80 | 15.0 | 75.0 |
| Group | cos² (First Two Factors) |
|---|---|
| Chemical | 0.70 |
| Sensory | 0.60 |
| Tasting Notes | 0.40 |
| Wine | Chemical (pH) | Sensory (Aroma Score) | Tasting Notes |
|---|---|---|---|
| 1 | 3.45 | 7.2 | Fruity |
| 2 | 3.60 | 6.8 | Oaky |
| 3 | 3.30 | 8.1 | Spicy |
| 4 | 3.50 | 7.5 | Fruity |
| 5 | 3.40 | 6.9 | Oaky |


