DATA AND STATISTICS · 23 entries
Data and Statistics.
Entries on classical statistical methods and multivariate techniques used in empirical research: hypothesis testing, regression, factor analysis, structural equation modeling, and associated metrics.
Analysis of variance (ANOVA) Analysis of Variance: classical statistical technique for comparing means across three or more groups. Established by Fisher in 1925, it forms the foundation of experimental design in biomedical, agricultural, and behavioral sciences.
Statistics Bibliometric analysis Quantitative mapping of a field's scientific output through article metadata: coauthorship networks, co-citation, temporal evolution, emerging fronts. Today relies on Scopus, Web of Science, and tools like VOSviewer and Bibliometrix.
Statistics Bootstrap Family of resampling-with-replacement methods that estimates the sampling distribution of an estimator from a single sample. Proposed by Efron (1979). Enables CIs and hypothesis tests without parametric normality assumptions.
Statistics Cluster analysis Family of unsupervised methods that groups observations by similarity. Classical algorithms: k-means (MacQueen, 1967), hierarchical clustering, DBSCAN. Validation via silhouette (Rousseeuw, 1987), stability, and interpretability.
Statistics Confidence interval Range of values constructed from sample data which, under repeated use, contains the true population parameter with probability equal to the nominal confidence level (typically 95%). Formalized by Neyman in 1937.
Statistics Confirmatory factor analysis (CFA) Modeling technique that tests whether a hypothesized *a priori* factor structure fits observed data. Psychometric standard for validating measurement instruments with scales and items; established by Jöreskog in 1969 and implemented today in lavaan, Mplus, and AMOS.
Statistics Convergent and discriminant validity Instrument validity criteria: convergent (items of the same construct correlate strongly) and discriminant (items of distinct constructs correlate weakly). Classical operationalization via AVE by Fornell and Larcker (1981) and HTMT by Henseler et al. (2015).
Statistics Cronbach's alpha Classical coefficient of internal consistency for scales and instruments, proposed by Cronbach in 1951. Despite massive use in psychometrics, today widely criticized for restrictive assumptions — alternatives such as McDonald's omega are preferred.
Statistics Effect size Quantitative measure of the magnitude of an observed effect or difference, independent of sample size. Includes the d (Cohen), r (correlation), and odds ratio families. A reporting component required by modern standards (DORA, ASA, APA, AMA).
Statistics Exploratory factor analysis (EFA) Multivariate data reduction technique that identifies latent factors underlying a set of observed variables, without a priori hypothesis about structure. Typically precedes CFA in measurement instrument validation.
Statistics Linear regression Statistical model estimating the linear relationship between a dependent variable and one or more independent variables. Methodological foundation of much of applied statistics and pedagogical entry point for more complex predictive models.
Statistics Logistic regression Statistical model for categorical dependent variable that estimates the probability of belonging to a category as a logistic function of predictors. Variants: binary, multinomial, and ordinal. Cox (1958) formalized it for binary response.
Statistics MANOVA Multivariate analysis of variance: extension of ANOVA to multiple dependent variables simultaneously. Tests whether group means differ considering correlation structure across outcomes. Test statistics: Wilks' Lambda, Pillai, Hotelling-Lawley, Roy.
Statistics Mediation and moderation Mediation: variable M explains HOW X affects Y (causal mechanism). Moderation: variable W modifies WHEN or FOR WHOM the effect of X on Y occurs (interaction). Distinction formalized by Baron and Kenny (1986); modern approach via Hayes (2018).
Statistics Missing data and multiple imputation Treatment of missing values in research data. Mechanisms: MCAR, MAR, MNAR. Multiple imputation (Rubin, 1987) generates m complete datasets via posterior sampling, combining estimates via Rubin's rules for valid inference.
Statistics Mixed-effects models (GLMM) Generalized models combining fixed effects (population parameters) and random effects (variation across groups/subjects). Appropriate for nested, longitudinal, or grouped data. Canonical R implementation via lme4 (Bates et al., 2015).
Statistics Network analysis Family of methods to study relations among entities represented as nodes and edges. Central metrics: centrality (degree, betweenness, eigenvector), density, modularity, community detection. Wasserman and Faust (1994) is the classical reference.
Statistics P-value Probability of obtaining, under the null hypothesis, a test statistic at least as extreme as the observed value. Central metric in frequentist hypothesis testing. The ASA issued a formal statement in 2016 warning against common misinterpretations.
Statistics Propensity score matching Causal inference method in observational studies that matches treated and controls based on propensity score — estimated probability of receiving treatment given covariates. Rosenbaum and Rubin (1983) formalized. Reduces observable confounding bias.
Statistics Statistical power Probability that a statistical test correctly rejects the null hypothesis when it is false, i.e., $1 - \beta$. Recommended minimum standard: 0.80. Cohen (1988) formalized sample size calculation based on power. Preregistration today requires a priori analysis.
Statistics Structural equation modeling (SEM) Family of multivariate techniques combining factor analysis and multiple regression to test networks of relationships between latent and observed variables. Standard in social, behavioral, and health sciences for validating complex theoretical models.
Statistics Survival analysis Family of methods for time-to-event (death, recurrence, failure) with explicit handling of censored data. Kaplan-Meier estimator (1958) for the survival function; Cox model (1972) for hazard ratio regression.
Statistics Time series Family of statistical methods for time-ordered data, modeling trend, seasonality, autocorrelation, and noise. Classical decomposition X = T + S + R; canonical parametric models ARIMA (Box and Jenkins, 1976). Forecasting is the central objective.
Statistics