DATA & STATISTICS

Exploratory factor analysis (EFA)

Multivariate data reduction technique that identifies latent factors underlying a set of observed variables, without a priori hypothesis about structure. Typically precedes CFA in measurement instrument validation.

Extended definition

Exploratory factor analysis (EFA) is a multivariate data reduction technique that identifies latent factors underlying a set of observed variables, without imposing a priori structure. The central model is:

X=ΛF+ϵX = \Lambda F + \epsilon

where XX is the matrix of observed variables, FF the latent factors, Λ\Lambda the factor loading matrix, and ϵ\epsilon the item-specific error. Unlike CFA, where the researcher specifies in advance how many factors exist and which items load on which factors, EFA discovers this structure from the data. Critical methodological decisions: extraction method (maximum likelihood vs. principal axis factors — Fabrigar et al., 1999, recommend maximum likelihood when normality assumptions are reasonable), number of factors to retain (Kaiser eigenvalue > 1, scree plot, parallel analysis — the latter is the contemporary standard), and rotation method (varimax when factors are expected orthogonal; oblimin/promax when inter-factor correlations are plausible — common in social sciences). Costello and Osborne (2005) synthesized best practices and documented that most published studies use suboptimal configurations (PCA instead of factor analysis, varimax without justification, Kaiser without parallel analysis).

When it applies

EFA is appropriate at the exploratory stage of instrument validation — when the researcher develops a new questionnaire and wants to discover the latent structure emerging from responses. It applies to datasets of hundreds to thousands of respondents (recommended minimum: n5n \geq 5 per item, ideally n10n \geq 10). It applies in dimensional reduction of correlated variables in research projects where the number of variables is large relative to sample size. In rigorous validation pipelines, EFA is run on one half of the sample (split-half) and CFA on the other half, ensuring the discovered structure is confirmed in independent data.

When it does not apply

It does not apply when the researcher already has solid theory about factor structure — CFA is the appropriate technique then. It does not apply to dichotomous categorical variables based on Pearson correlation — requires tetrachoric correlation or categorical factor analysis. It does not apply in small samples (n<100n < 100) with many items: the found structure is unstable and does not replicate. It does not apply as a substitute for PCA in dimension reduction feeding other models: PCA preserves total variance, EFA preserves common (correlated) variance. It does not apply to ordinal variables with few categories (3-4) without adjustments (polychoric correlation in SEM with WLSMV).

Applications by field

Psychology and psychometrics: EFA is a structural technique in the construction of new scales; rigorous validation tradition requires EFA followed by CFA in an independent sample. — Organizational research: validation of questionnaires on culture, climate, engagement; factor structure is input for diagnostics. — Education: validation of assessment instruments; item factor structure informs test quality. — Marketing and consumer science: identification of latent dimensions in attitude and preference scales.

Common pitfalls

The first pitfall is using PCA (Principal Component Analysis) and calling it EFA — they are mathematically different techniques (PCA preserves 100% of variance in components; EFA models only common variance, separating item-specific error). The second is using Kaiser eigenvalue > 1 without complementing with scree plot and parallel analysis — Kaiser systematically overestimates the number of factors. The third is choosing varimax rotation by default even when there is theoretical reason for inter-factor correlation (e.g., personality dimensions) — oblimin is more appropriate in that case. The fourth is interpreting factor loadings smaller than 0.40 as substantive evidence — Costello & Osborne (2005) suggest 0.32 as the absolute minimum, with cross-loading items (>0.32 on two factors) candidates for removal. The fifth is confusing EFA with confirmation: discovering structure on the same sample where theory is tested is not validation — it is circular.

Last updated —