Mixed-effects models (GLMM) — Glossary Aria Research

Extended definition

Generalized linear mixed models (GLMM; also mixed-effects models, multilevel models, hierarchical linear models) combine two parameter types: fixed effects, which estimate constant population relationships (analogous to classical regression coefficients), and random effects, which model systematic variation across groups, subjects, or clusters as samples from a distribution (typically normal). The basic specification for longitudinal data with $n$ subjects and $j$ measures per subject:

y_{ij} = \beta_0 + \beta_1 X_{ij} + u_{0i} + \varepsilon_{ij}, \quad u_{0i} \sim N(0, \sigma_u^2), \quad \varepsilon_{ij} \sim N(0, \sigma^2)

where $u_{0i}$ is the random intercept for subject $i$ . More complex specifications include random slopes, nested structures (students in schools in districts), and cross-classified (students in schools + neighborhoods, non-nested). Bates et al. (2015, Journal of Statistical Software) described the R package lme4, today the dominant implementation. Pinheiro and Bates (2000, Mixed-Effects Models in S and S-PLUS, Springer) consolidated the classical theoretical reference. Typical estimation via REML (Restricted Maximum Likelihood) or ML; significance testing for fixed effects via Wald, likelihood ratio, or Satterthwaite/Kenward-Roger methods for approximate degrees of freedom.

When it applies

GLMM applies to any data structure where observations are not independent: repeated measures on the same subject (longitudinal, panel), nested data (students in schools, patients in hospitals), grouped data (measures in families, litters, geographic clusters). It applies in clinical trials with pre/post and follow-up measures; educational research with school/teacher effects; ecology with site/year effects; psychometrics with items crossed with subjects. For non-normal outcomes (binary, count, proportion), GLMM with appropriate link function (logit, log, probit) is the generalization. It applies when the interest lies in both average population estimates and the magnitude of between-cluster variability — information discarded by fixed-effects-only models.

When it does not apply

It does not apply to simple independent data — classical regression (linear, logistic, Poisson) is simpler and sufficient. It does not apply directly when the number of clusters is very small ( $K < 5$ ): between-cluster variance is poorly estimated; cluster fixed effects (dummies) are an alternative. It does not apply to time series with dominant autocorrelation and $K = 1$ — GLMM handles intra-cluster correlation but ARIMA structure may be more appropriate. It does not replace causal inference: GLMM controls between-cluster heterogeneity but not unmeasured within-cluster confounding. In extremely unbalanced data (some clusters with 1 observation, others with 100), convergence may fail and estimates become unstable.

Applications by field

— Health and clinical trials: repeated-measures models in longitudinal studies; center effects in multicenter trials. — Education: multilevel analysis with students nested in schools; value-added models for teacher effects. — Ecology: random effects of site, year, individual; spatial models with spatial residual correlation. — Psychometrics: item analysis with subjects as random effects; multilevel SEM in organizational research.

Common pitfalls

The first pitfall is treating nested data with classical regression ignoring structure: underestimates standard errors, inflates false-positive rate, violates independence assumption. The second is specifying only random intercept when slopes also vary across clusters: an underspecified model can produce biased estimates. The third is failing to test random-structure singularity: lme4 warns of “singular fit” when estimated variance is effectively zero — may indicate over-parameterized model. The fourth is interpreting fixed-effect coefficients without considering between-cluster variation: an average population effect of $\beta = 0.3$ can mask large variation ( $\sigma_{\beta} = 0.5$ across clusters). The fifth is using naïve p-values from lm/glm in GLMM: degrees of freedom are problematic; use lmerTest (Satterthwaite) or bootstrap; report CIs based on likelihood profile or bootstrap.