top of page
Search

Cross-Validation in Psychology (Part 2): Model Selection and Predictive Metrics

  • Writer: Yulia Kuzmina
    Yulia Kuzmina
  • 1 hour ago
  • 4 min read

In the previous post, I discussed the historical roots of cross-validation. In this post, I turn to how the method is currently discussed in psychological research.

Two recent papers approach the problem of cross-validation from different angles.

The first is “Cross-validation: A Method Every Psychologist Should Know” (De Rooij & Weeda, 2020). This article provides a practical framework for implementing cross-validation in psychological research, especially in the context of model comparison and predictor selection.

The second is “Cross-validation and predictive metrics in psychological research: Do not leave out the leave-one-out” (Iglesias, Sorrel, & Olmos, 2025). This paper takes a more technical perspective. It evaluates how different cross-validation procedures behave under common conditions in psychological research and questions some widely accepted rules of thumb.

Together, these two articles illustrate that the challenges of applying cross-validation in psychology are both practical and conceptual.

Cross-validation as a framework for model comparison

De Rooij and Weeda (2020) emphasize that cross-validation becomes especially valuable when researchers face competing predictive models. The key issue is not simply whether a single regression model fits the data, but which model among several alternatives provides better out-of-sample performance.

In traditional psychological practice, cross-validation is often applied to a final model derived from significance testing. In contrast, in statistical learning, cross-validation is embedded within the model selection process itself: each candidate model is evaluated, and the one with the lowest estimated prediction error is selected.

This shift introduces the classical bias–variance trade-off. More complex models typically reduce in-sample error but increase variance, meaning that predictions may fluctuate across samples. Cross-validation provides a way to estimate predictive error while balancing these two components.

Four decisions in practical implementation

De Rooij and Weeda (2020) outline several decisions researchers must make when implementing cross-validation.


1. Number of folds (K)

The choice of K affects the bias–variance trade-off of the estimated prediction error.

  • When K equals the sample size (leave-one-out cross-validation, LOO), the training set in each fold is nearly as large as the full dataset. This leads to almost unbiased error estimates but potentially higher variability.

  • When K is small (e.g., K = 2), variance decreases but bias increases.


The commonly cited recommendation of 5- or 10-fold cross-validation originates largely from machine learning contexts involving large datasets and computationally expensive models.

Iglesias et al. (2025) reexamine this assumption. Within the general linear modeling framework they study, increasing K generally improves the accuracy of prediction error estimates, and LOO often shows the best accuracy and stability properties. Their results suggest that the “5–10 fold is standard” rule may not always be optimal for psychological data.

However, they also emphasize that the performance of LOO depends on the modeling framework and computational feasibility.


2. Number of repetitions

K-fold cross-validation relies on random partitioning. Repeating the procedure reduces instability due to arbitrary splits.

De Rooij and Weeda (2020), following Harrell (2015), recommend repeated cross-validation and discuss using diagnostics such as the proportion of wins across repetitions to assess whether the number of repetitions is sufficient.

Repetitions are unnecessary in LOO, since the splits are fully determined by the data.


3. Choice of predictive metric

Cross-validation estimates prediction error relative to a chosen loss function.

RMSE is often used by default, but alternative loss functions may be more appropriate depending on the research goal.

Iglesias et al. (2025) show that aggregation across folds is not trivial, especially for R². Computing R² within each fold and averaging can produce unstable estimates because total sums of squares are recalibrated within each fold. They propose alternative aggregation strategies based on pooled sums of squared predictions.

This highlights that cross-validation involves not only splitting the data, but also careful consideration of how predictive performance is computed and summarized.


4. Defining the model set

Cross-validation compares models, but it does not determine which models should be included.

De Rooij and Weeda (2020) distinguish between data-driven and theory-driven approaches. Data-driven model selection can produce optimistically biased estimates of prediction error, since the same data are used for model search and evaluation. A theory-driven approach specifies a finite set of theoretically meaningful models and uses cross-validation to compare them.

In many psychological studies, the number of predictors is modest, making it feasible to evaluate all theoretically plausible models within a cross-validation framework.

Explanation and prediction

Recent discussions in psychological science stress the importance of integrating explanation and prediction rather than treating them as competing goals.

Resampling-based cross-validation helps identify more stable and better-specified models within a dataset. Independent-sample validation then tests whether these models truly generalize.

The two perspectives discussed above reflect different aspects of the same problem. De Rooij and Weeda (2020) focus on how cross-validation can be practically implemented for model comparison in psychology. Iglesias et al. (2025) examine how different cross-validation procedures and predictive metrics behave under realistic conditions and caution against uncritical reliance on simplified rules.

In the next post, I will turn to concrete empirical examples and examine how cross-validation is implemented in actual psychological studies.

 
 
 

Comments


bottom of page