Hubbry Logo
Multiple factor analysisMultiple factor analysisMain
Open search
Multiple factor analysis
Community hub
Multiple factor analysis
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Multiple factor analysis
Multiple factor analysis
from Wikipedia

Multiple factor analysis (MFA) is a factorial method[1] devoted to the study of tables in which a group of individuals is described by a set of variables (quantitative and / or qualitative) structured in groups. It is a multivariate method from the field of ordination used to simplify multidimensional data structures. MFA treats all involved tables in the same way (symmetrical analysis). It may be seen as an extension of:

Introductory example

[edit]

Why introduce several active groups of variables in the same factorial analysis?

data

Consider the case of quantitative variables, that is to say, within the framework of the PCA. An example of data from ecological research provides a useful illustration. There are, for 72 stations, two types of measurements:

  1. The abundance-dominance coefficient of 50 plant species (coefficient ranging from 0 = the plant is absent, to 9 = the species covers more than three-quarters of the surface). The whole set of the 50 coefficients defines the floristic profile of a station.
  2. Eleven pedological measurements (Pedology = soil science): particle size, physical, chemistry, etc. The set of these eleven measures defines the pedological profile of a station.

Three analyses are possible:

  1. PCA of flora (pedology as supplementary): this analysis focuses on the variability of the floristic profiles. Two stations are close one another if they have similar floristic profiles. In a second step, the main dimensions of this variability (i.e. the principal components) are related to the pedological variables introduced as supplementary.
  2. PCA of pedology (flora as supplementary): this analysis focuses on the variability of soil profiles. Two stations are close if they have the same soil profile. The main dimensions of this variability (i.e. the principal components) are then related to the abundance of plants.
  3. PCA of the two groups of variables as active: one may want to study the variability of stations from both the point of view of flora and soil. In this approach, two stations should be close if they have both similar flora 'and' similar soils.

Balance between groups of variables

[edit]

Methodology

[edit]

The third analysis of the introductory example implicitly assumes a balance between flora and soil. However, in this example, the mere fact that the flora is represented by 50 variables and the soil by 11 variables implies that the PCA with 61 active variables will be influenced mainly by the flora at least on the first axis). This is not desirable: there is no reason to wish one group play a more important role in the analysis.

The core of MFA is based on a factorial analysis (PCA in the case of quantitative variables, MCA in the case of qualitative variables) in which the variables are weighted. These weights are identical for the variables of the same group (and vary from one group to another). They are such that the maximum axial inertia of a group is equal to 1: in other words, by applying the PCA (or, where applicable, the MCA) to one group with this weighting, we obtain a first eigenvalue equal to 1. To get this property, MFA assigns to each variable of group a weight equal to the inverse of the first eigenvalue of the analysis (PCA or MCA according to the type of variable) of the group .

Formally, noting the first eigenvalue of the factorial analysis of one group , the MFA assigns weight for each variable of the group .

Balancing maximum axial inertia rather than the total inertia (= the number of variables in standard PCA) gives the MFA several important properties for the user. More directly, its interest appears in the following example.

Example

[edit]

Let two groups of variables defined on the same set of individuals.

  1. Group 1 is composed of two uncorrelated variables A and B.
  2. Group 2 is composed of two variables {C1, C2} identical to the same variable C uncorrelated with the first two.

This example is not completely unrealistic. It is often necessary to simultaneously analyse multi-dimensional and (quite) one-dimensional groups.

Each group having the same number of variables has the same total inertia.

In this example the first axis of the PCA is almost coincident with C. Indeed, in the space of variables, there are two variables in the direction of C: group 2, with all its inertia concentrated in one direction, influences predominantly the first axis. For its part, group 1, consisting of two orthogonal variables (= uncorrelated), has its inertia uniformly distributed in a plane (the plane generated by the two variables) and hardly weighs on the first axis.

Numerical Example

Table 1. MFA. Test data. A and B (group 1) are uncorrelated. C1 and C2 (group 2) are identical.
1 1 1 1
2 3 4 4
3 5 2 2
4 5 2 2
5 3 4 4
6 1 2 2
Table 2. Test data. Decomposition of the inertia in the PCA and in the MFA applied to data in Table 1.
PCA
Inertia 2.14 (100%) 1
group 1 0.24(11%) 1
group 2 1.91(89%) 0
MFA
Inertia 1.28(100%) 1
group 1 0.64(50%) 1
group 2 0.64(50%) 0

Table 2 summarizes the inertia of the first two axes of the PCA and of the MFA applied to Table 1.

Group 2 variables contribute to 88.95% of the inertia of the axis 1 of the PCA. The first axis () is almost coincident with C: the correlation between C and is .976;

The first axis of the MFA (on Table 1 data) shows the balance between the two groups of variables: the contribution of each group to the inertia of this axis is strictly equal to 50%.

The second axis, meanwhile, depends only on group 1. This is natural since this group is two-dimensional while the second group, being one-dimensional, can be highly related to only one axis (here the first axis).

Conclusion about the balance between groups

[edit]

Introducing several active groups of variables in a factorial analysis implicitly assumes a balance between these groups.

This balance must take into account that a multidimensional group influences naturally more axes than a one-dimensional group does (which may not be closely related to one axis).

The weighting of the MFA, which makes the maximum axial inertia of each group equal to 1, plays this role.

Application examples

[edit]

Survey Questionnaires are always structured according to different themes. Each theme is a group of variables, for example, questions about opinions and questions about behaviour. Thus, in this example, we may want to perform a factorial analysis in which two individuals are close if they have both expressed the same opinions and the same behaviour.

Sensory analysis A same set of products has been evaluated by a panel of experts and a panel of consumers. For its evaluation, each jury uses a list of descriptors (sour, bitter, etc.). Each judge scores each descriptor for each product on a scale of intensity ranging for example from 0 = null or very low to 10 = very strong. In the table associated with a jury, at the intersection of the row and column , is the average score assigned to product for descriptor .

Individuals are the products. Each jury is a group of variables. We want to achieve a factorial analysis in which two products are similar if they were evaluated in the same way by both juries.

Multidimensional time series variables are measured on individuals. These measurements are made at dates. There are many ways to analyse such data set. One way suggested by MFA is to consider each day as a group of variables in the analysis of the tables (each table corresponds to one date) juxtaposed row-wise (the table analysed thus has rows and x columns).

Conclusion: These examples show that in practice, variables are very often organized into groups.

Graphics from MFA

[edit]

Beyond the weighting of variables, interest in MFA lies in a series of graphics and indicators valuable in the analysis of a table whose columns are organized into groups.

Graphics common to all the simple factorial analyses (PCA, MCA)

[edit]

The core of MFA is a weighted factorial analysis: MFA firstly provides the classical results of the factorial analyses.

1. Representations of individuals in which two individuals are close to each other if they exhibit similar values for many variables in the different variable groups; in practice the user particularly studies the first factorial plane.

2.Representations of quantitative variables as in PCA (correlation circle).

Figure1. MFA. Test data. Representation of individuals on the first plane.
Figure2. MFA. Test data. Representation of variables on the first plane.

In the example:

  • The first axis mainly opposes individuals 1 and 5 (Figure 1).
  • The four variables have a positive coordinate (Figure 2): the first axis is a size effect. Thus, individual 1 has low values for all the variables and individual 5 has high values for all the variables.

3. Indicators aiding interpretation: projected inertia, contributions and quality of representation. In the example, the contribution of individuals 1 and 5 to the inertia of the first axis is 45.7% + 31.5% = 77.2% which justifies the interpretation focussed on these two points.

4. Representations of categories of qualitative variables as in MCA (a category lies at the centroid of the individuals who possess it). No qualitative variables in the example.

Graphics specific to this kind of multiple table

[edit]

5. Superimposed representations of individuals « seen » by each group. An individual considered from the point of view of a single group is called partial individual (in parallel, an individual considered from the point of view of all variables is said mean individual because it lies at the center of gravity of its partial points). Partial cloud gathers the individuals from the perspective of the single group (ie ): that is the cloud analysed in the separate factorial analysis (PCA or MCA) of the group . The superimposed representation of the provided by the MFA is similar in its purpose to that provided by the Procrustes analysis.

Figure 3. MFA. Test data. Superimposed representation of mean and partial clouds.

In the example (figure 3), individual 1 is characterized by a small size (i.e. small values) both in terms of group 1 and group 2 (partial points of the individual 1 have a negative coordinate and are close one another). On the contrary, the individual 5 is more characterized by high values for the variables of group 2 than for the variables of group 1 (for the individual 5, group 2 partial point lies further from the origin than group 1 partial point). This reading of the graph can be checked directly in the data.

6. Representations of groups of variables as such. In these graphs, each group of variables is represented by a single point. Two groups of variables are close one another when they define the same structure on individuals. Extreme case: two groups of variables that define homothetic clouds of individuals coincide. The coordinate of group along the axis is equal to the contribution of the group to the inertia of MFA dimension of rank . This contribution can be interpreted as an indicator of relationship (between the group and the axis , hence the name relationship square given to this type of representation). This representation also exists in other factorial methods (MCA and FAMD in particular) in which case the groups of variable are each reduced to a single variable.

Figure4. MFA. Test data. Representation of groups of variables.

In the example (Figure 4), this representation shows that the first axis is related to the two groups of variables, while the second axis is related to the first group. This agrees with the representation of the variables (figure 2). In practice, this representation is especially precious when the groups are numerous and include many variables.

Other reading grid. The two groups of variables have in common the size effect (first axis) and differ according to axis 2 since this axis is specific to group 1 (he opposes the variables A and B).

7. Representations of factors of separate analyses of the different groups. These factors are represented as supplementary quantitative variables (correlation circle).

Figure 5. MFA. Test data. Representation of the principal components of separate PCA of each group.

In the example (figure 5), the first axis of the MFA is relatively strongly correlated (r = .80) to the first component of the group 2. This group, consisting of two identical variables, possesses only one principal component (confounded with the variable). The group 1 consists of two orthogonal variables: any direction of the subspace generated by these two variables has the same inertia (equal to 1). So there is uncertainty in the choice of principal components and there is no reason to be interested in one of them in particular. However, the two components provided by the program are well represented: the plane of the MFA is close to the plane spanned by the two variables of group 1.

Conclusion

[edit]

The numerical example illustrates the output of the MFA. Besides balancing groups of variables and besides usual graphics of PCA (of MCA in the case of qualitative variables), the MFA provides results specific of the group structure of the set of variables, that is, in particular:

  • A superimposed representation of partial individuals for a detailed analysis of the data;
  • A representation of groups of variables providing a synthetic image more and more valuable as that data include many groups;
  • A representation of factors from separate analyses.

The small size and simplicity of the example allow simple validation of the rules of interpretation. But the method will be more valuable when the data set is large and complex. Other methods suitable for this type of data are available. Procrustes analysis is compared to the MFA in.[2]

History

[edit]

MFA was developed by Brigitte Escofier and Jérôme Pagès in the 1980s. It is at the heart of two books written by these authors:[3] and.[4] The MFA and its extensions (hierarchical MFA, MFA on contingency tables, etc.) are a research topic of applied mathematics laboratory Agrocampus (LMA ²) which published a book presenting basic methods of exploratory multivariate analysis.[5]

Software

[edit]

MFA is available in two R packages (FactoMineR and ADE4) and in many software packages, including SPAD, Uniwin, XLSTAT, etc. There is also a function SAS[permanent dead link] . The graphs in this article come from the R package FactoMineR.

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Multiple factor analysis (MFA) is a multivariate statistical technique that extends to handle datasets consisting of multiple blocks or groups of variables—typically a mix of quantitative and qualitative types—measured on the same set of observations. By normalizing each variable block individually (often by scaling to unit first eigenvalue) and then concatenating them for a global PCA, MFA balances the influence of disparate groups, enabling the identification of common structures across blocks while assessing their individual contributions and relationships. This method provides factor scores, loadings, and visualizations that summarize complex, multitable data in a unified framework, making it particularly suited for exploratory analysis where traditional PCA might be biased by imbalanced variable sets. Developed by statisticians Brigitte Escofier and Jérôme Pagès in the late 1980s and early 1990s, MFA emerged as a synthesis of earlier multivariate approaches, including canonical analysis, rotation, and individual differences scaling (INDSCAL), to address the challenges of integrating heterogeneous data tables. Their seminal work, detailed in a 1994 publication, formalized MFA as a weighted tool capable of processing both numerical and categorical variables on shared individuals, with implementations available in software packages like AFMULT. Subsequent refinements, such as extensions for incomplete data and hierarchical structures, have built on this foundation, enhancing its applicability in modern computational environments. At its core, MFA operates in two main stages: first, performing separate PCAs (or multiple correspondence analyses for categorical blocks) on each normalized data table to derive partial factor scores; second, aggregating these scores into a composite table for an unweighted global PCA, which yields overall dimensions representing consensus across blocks. This process not only reveals how observations cluster but also quantifies the similarity between blocks through metrics like RV coefficients, allowing researchers to evaluate whether certain variable groups align or diverge in explaining variance. Dual formulations of MFA exist for cases where the same variables are observed across different samples, further broadening its utility. MFA finds extensive use in fields requiring the integration of diverse data sources, such as sensory science—where it analyzes panels of descriptors alongside instrumental measurements for products like wines or foods—and in for combining consumer surveys with demographic profiles. It has also been applied in to diagnose species relationships from morphological and genetic blocks, in for multiblock ecological data, and in social sciences for exploring multifaceted survey responses. These applications highlight MFA's strength in providing interpretable, balanced insights into complex systems without requiring prior assumptions about variable importance.

Introduction

Definition and Objectives

Multiple factor analysis (MFA) is a principal component method designed for the simultaneous analysis of multiple groups of variables, which can be numerical and/or categorical, measured on the same set of observations.90135-X) It aims to identify common underlying structures across these groups while evaluating the balance or relative contributions of each group to the overall analysis. The primary objectives of MFA include summarizing complex multi-table data into a lower-dimensional representation, detecting redundancies or complementarities between variable groups, and achieving a unified dimensionality reduction that accounts for the individual inertias of each group. By normalizing and weighting the groups appropriately, MFA facilitates the exploration of shared patterns without one group dominating the results due to scale differences.90135-X) This approach builds on principal component analysis (PCA) for continuous variables and multiple correspondence analysis (MCA) for categorical ones, adapting their principles to multi-group settings. Key benefits of MFA lie in its ability to handle mixed data types without requiring homogeneity across groups, enabling direct comparisons of the importance of different variable sets, and supporting exploratory analyses in diverse fields such as sensory evaluation and multi-omics studies. In sensory evaluation, for instance, it allows integration of assessor ratings and physicochemical measurements to assess product perceptions holistically. Similarly, in multi-omics research, MFA integrates datasets like and to uncover coordinated biological variations. The main outputs of MFA consist of a global factor map representing the compromise across all groups, partial factor maps illustrating each group's specific structure projected onto the global axes, and balance indicators that quantify the contributions and s of individual groups. These visualizations and metrics provide insights into both the consensus and discrepancies among the data tables.90135-X)

Relation to Other Factorial Methods

Multiple factor analysis (MFA) extends (PCA) to the analysis of multiple data tables describing the same set of observations, addressing the limitations of standard PCA when applied to single-block data by incorporating a normalization step for each group to ensure balanced contributions. In PCA, the focus is on maximizing variance within a single table through eigenvalue decomposition, whereas MFA first performs a PCA on each individual group (or block) of variables, scales the data by dividing by the of the first eigenvalue of that group's PCA to normalize , and then concatenates the normalized tables for a global PCA. This adaptation prevents any single group with larger variance from dominating the analysis, allowing for a joint representation that respects the structure of heterogeneous data sets. For groups involving categorical variables, MFA integrates principles from (MCA) by treating such data as and adjusting for category frequencies to align the scaling with continuous variables, effectively performing MCA within each qualitative group before normalization. Unlike standalone MCA, which analyzes categorical data using chi-squared distances to handle the double inherent in multiple categories, MFA embeds this within the multi-group framework, representing categories by their centers of gravity rather than disjunctive coding alone, which facilitates integration with quantitative groups without distorting the overall factor space. This hybrid approach ensures that categorical and continuous variables contribute comparably to the global factors after normalization by the first of the group's MCA. MFA distinguishes itself from other multi-table methods, such as STATIS, which seeks a compromise between tables by optimizing weights to maximize the similarity of observation factor scores across groups via RV coefficients, whereas MFA employs a fixed normalization scheme to promote balance without iterative weighting. In contrast to multi-block PCA variants like SUM-PCA, which concatenate blocks after simple variance and may allow dominant blocks to overshadow others, MFA's inertia-based normalization and emphasis on eigenvalue ratios—comparing each group's first eigenvalue to the global principal component—explicitly assess and enforce equilibrium across groups, making it particularly suited for mixed data types. These differences position MFA as a balanced extension of methods for multi-block settings, assuming prior of PCA's variance maximization and MCA's metrics.

Data Structure and Preparation

Organization of Multiple Variable Groups

Multiple factor analysis (MFA) requires a multi-table where a set of I observations is described by K distinct groups of variables, with the k-th group comprising J_k variables organized as an I × J_k matrix. These matrices are conceptually concatenated horizontally to form a global I × ∑J_k data set, though each group is analyzed separately in the initial stages to account for its internal structure. For instance, in of food products, one group might include physical attributes like and density, while another covers sensory attributes such as sweetness and bitterness. A fundamental requirement is that the same I observations must appear across all K groups, ensuring comparability and alignment in the analysis. Missing values can be handled in standard MFA: for numerical variables, often imputed with column means; for categorical variables, treated as an additional category or coded as absent in the disjunctive table. Complete data is ideal for accuracy, and advanced imputation methods are available for complex cases. Groups must also be conceptually distinct, representing different aspects or domains of the observations (e.g., quantitative measurements versus qualitative descriptors), to facilitate the identification of shared and unique patterns. Preprocessing begins with centering all numerical variables within each group by subtracting the group-specific , which removes effects and focuses on variance. For categorical variables, they are transformed into disjunctive tables or indicator matrices, where each category becomes a binary column (1 if present, 0 otherwise), enabling factorial treatment akin to (MCA). If variables within a group exhibit differing scales, they may be scaled to unit variance prior to analysis to ensure equitable contribution during group factorization. When defining groups, practitioners should aim for a relatively balanced number of variables (J_k) across the K groups to prevent any single group from disproportionately influencing the global structure, although subsequent normalization steps in MFA mitigate imbalances. Each group is typically analyzed using for quantitative variables or MCA for categorical ones, providing the foundation for integration.

Handling Different Variable Types

In Multiple Factor Analysis (MFA), data are organized into groups of variables, where the treatment of variable types within each group is essential for equitable contribution to the overall analysis. Numerical variables are standardized by centering them to a of zero and scaling to unit variance, enabling the application of (PCA) that relies on Euclidean distances to summarize the group's variability. This ensures that all numerical variables have comparable scales, preventing any single variable from dominating the group's principal components. Categorical variables require transformation into a disjunctive table, consisting of indicator columns for each category, to facilitate analysis. Each indicator column is weighted by the proportion of individuals who do not possess that category (1 - f_i, where f_i is the frequency of the category), and (MCA) is then applied using chi-squared distances, which account for the relative frequencies and capture associations among categories. This approach balances the influence of categories with varying prevalences, aligning the categorical group's with that of numerical groups. Groups with mixed variable types are rare in standard MFA, as the method assumes homogeneity within groups to apply appropriate metric spaces consistently. In such cases, variables are often separated into homogeneous sub-groups for separate PCA or MCA before integration, or extensions incorporate hybrid distances that combine Euclidean for numerical and chi-squared for categorical components; this is particularly noted in applications to multi-omics data, where diverse data modalities like continuous expression levels and discrete mutations necessitate adaptive handling. Ordinal variables pose type-specific challenges, as their ordered categories can be treated either as categorical—via disjunctive coding and MCA to respect discrete levels—or as numerical if the scale is sufficiently granular to approximate continuous , allowing and PCA. The decision hinges on the number of levels and the meaningfulness of intervals, ensuring the treatment aligns with the variable's measurement properties for compatibility in the global MFA framework. To validate the setup, groups should contain an adequate number of variables, such as more than five per group, to promote stability in the extracted factors and reliable estimation of group-specific inertias. Smaller groups risk unstable principal components and inflated variability in balance metrics.

Core Methodology

Group Normalization and Weighting

In Multiple Factor Analysis (MFA), the process of group normalization and weighting begins with the separate analysis of each group of variables to ensure equitable contributions across diverse data sets. For numerical variable groups, Principal Component Analysis (PCA) is performed on the centered and scaled data matrix XkX_k, while for categorical groups, Multiple Correspondence Analysis (MCA) is applied, yielding the first eigenvalue λ1k\lambda_{1k} for each group kk. This initial step captures the internal structure of each group independently, with λ1k\lambda_{1k} representing the maximum variance (or inertia) explained by the first principal component. Normalization follows to balance the influence of groups that may differ in size or variability. Specifically, the XkX_k for group kk is divided by the of the first eigenvalue λ1k\lambda_{1k}, producing the normalized matrix Zk=1λ1kXkZ_k = \frac{1}{\sqrt{\lambda_{1k}}} X_k
Add your contribution
Related Hubs
User Avatar
No comments yet.