Tests and Experiments: A Century-Old Divide in Psychology
- Yulia Kuzmina
- Jun 27
- 4 min read
In contemporary psychology, it's almost a cliché to speak of two major research traditions: the experimental and the correlational. But behind this oversimplification lies a fundamental difference in how researchers understand people and psychological processes.
Briefly, in experimental psychology, the primary goal is to uncover generalizable laws or effects. Individual variation is treated as “noise” that should be minimized to isolate the signal. The more homogeneous the sample, the clearer the effect. In this paradigm, reliability is defined by reproducibility—if many individuals show the same response under controlled conditions, the finding is considered robust. People are treated as interchangeable, assuming cognitive or emotional processes function uniformly across all “normative” individuals.
In contrast, correlational and individual-differences research embraces variation. Here, diversity among individuals isn’t a nuisance—it’s the very phenomenon being studied ("One men . Reliability is about consistency in ranking individuals or measuring traits with minimal error. The goal is to detect and explain patterns of variability—whether in cognitive ability, personality, or educational outcomes.
My own academic path began in this correlational tradition. I worked with large-scale educational datasets, psychometrics, and studies on individual differences. But as I became increasingly drawn to cognitive science, I encountered a different world. In experimental cognitive psychology, I was surprised to see many tasks designed to be extremely easy—yielding near-ceiling performance with almost no variation in aacuracy. This stood in stark contrast to educational and psychological testing, where varying item difficulty is a core design principle. The difference, I realized, stemmed from the differing goals of these traditions.
A Historical Divide
This divide isn't new. It was vividly articulated by Lee Cronbach in his influential 1957 article The Two Disciplines of Scientific Psychology, where he wrote:
“While the experimenter is interested only in variation he himself creates, the correlator finds his interest in the already existing variation between individuals, social groups, and species… Experimental method brings situational variables under tight control. It thus permits rigorous tests of hypothesis and confident statements about causation. The correlational method can study what man has not learned to control or can never hope to control… The correlator’s mission is to observe and organize the data form Nature’s experiments”
And then he confirmed the distancing between two approaches:
The personality, social, and child psychologists went one way; the perception and learning psychologists went the other; and the country between turn to desert

Yet the tension goes back even further. In 1924, Lewis Terman—famed for introducing the Binet intelligence test to the U.S.—sent a questionnaire to 22 leading psychologists, asking:
1. What distinguishes mental tests from psychological experiments?
2. What is the contribution of tests to psychology?
3. How valuable are tests?
Respondents included past APA presidents (e.g., Thorndike, Watson, Yerkes) and other famous psychologists and psychometricians (e.g., Thurstone, Cattell). Their answers coalesced around three themes:
· Tests assess individual differences; experiments uncover general laws.
· Tests are simpler, more pragmatic, and often shorter.
· Tests are applied tools, not necessarily vehicles for theory-building.
Some psychologists offered more nuanced takes. Watson (the "father" of behaviorism) observed that a test, when first administered to a large group, resembles an experiment; it only becomes a "test" when standardized.
Woodworth argued that the key difference lies not in procedure or purpose, but in how data are used—either as general reference or individual diagnosis.
Boring, a pioneer of the history of psychology, saw tests as “shortened experiments” repurposed for practical ends.
Terman himself warned against drawing rigid boundaries between tests and experiments. He argued that ignoring individual variation risks distorting our understanding of what is “normal”.
The Conversation Today: Levels of Analysis
Today, the debate has moved beyond methods. We now recognize that one of psychology’s most important distinctions lies in levels of analysis: are we studying patterns across individuals (population-level), processes within individuals (person-level), or underlying mechanisms that may operate across both?
Lundh (2023) offers a useful framework, suggesting that psychology consists of three distinct branches:
· Population psychology, which asks how common certain traits or behaviors are, or how variables relate across individuals.
· Mechanism-oriented psychology, which seeks general causal laws explaining cognitive or emotional processes.
· Person-oriented psychology, which focuses on the individual as a dynamic, integrated system developing over time.
Lundh argues that much confusion in psychological research arises from conflating these levels. Researchers may ask person-level questions (e.g., How does anxiety affect my learning?) but use population-level methods (e.g., group-level correlations), leading to mismatches between research questions, methodologies, and conclusions.
He also notes that person-level psychology is often marginalized—even though it addresses core aspects of individuality, motivation, and human development. Each branch has its own legitimate goals and methods, but failing to differentiate between them leads to conceptual confusion and poor research design.
This framework has helped me make sense of my own trajectory. I began with population-level research—working with educational assessments and large datasets. Later, I became fascinated by cognitive mechanisms and shifted toward experimental work. But I soon became uneasy with some of the reductionist assumptions in that space: that all meaningful phenomena can be traced back to universal processes—or even reduced to the molecular level.
Now, I find myself returning to population-level questions, not because I reject the search for mechanisms, but because I’ve come to value the complexity and richness of individual and group variability. At the same time, I’m more aware than ever of the limits of aggregation. Averaging across individuals may obscure important patterns—whether in experiments or surveys.
In the end, it’s not about whether you run experiments or use tests—it’s about the level at which you're asking questions, and whether your methods truly match that level of inquiry.



Comments