“The Future of Longitudinal Studies:
What we know; What we don’t know; What we need to
know”
Inferring Causality from Longitudinal Studies
Chaired By: Elizabeth Owens, University Of California, Berkeley
Friday, March 21, 2003
Diana Baumrind
"When are Causal Inferences Justified in the Debate about Physical
Discipline “Effects”?"
University of California, Berkeley
Causal inferences referring to “effects”
of physical discipline are not justified from the primary studies
synthesized in Gershoff’s (2002) meta-analytic review because
the great majority, even when longitudinal, were correlational and
failed to meet the elementary requisite conditions for making causal
inferences from correlational data. In the opinion of Overton, Editor
of the SRCD Monographs, data from most social science research,
including experimental research, do not justify causal terms such
as “influence”, “affect”, or “the
risk factor caused the behavior.” However, public policy application
of such research too often presumes it can support causal inferences.
For example, if divorce or physical punishment is not at least a
potential causal risk factor for the detrimental child attributes
with which it is correlated, then social policy recommendations
based on such findings, even from longitudinal research, would be
rash.
Although correlational evidence can never establish cause, such
evidence can support a principled argument that one characteristic
such as “spanking” is a plausible causal component in
producing another characteristic such as aggression, with which
it is correlated, provided that at least four conditions are all
met: 1) the phenomenon corresponding to “spanking” must
be bracketed to exclude very severe levels of physical punishment;
otherwise the correlation will merely reflect abusive levels, 2)
temporality must be established by controlling for the baseline
“outcome” variable in a prospective longitudinal design,
3) plausible rival methodological and substantive hypotheses must
be rigorously tested by statistically controlling for plausible
“third” variables, 4) reliable and valid measures of
putative dependent, independent and third variables must be employed.
In addition, ancillary desirable conditions include an effect size
sufficient to rule out the alternative hypothesis that the association
is due to a weak unmeasured confounder, and consistency in most
contexts and populations.
Unfortunately most correlational studies, even when longitudinal,
do not meet these criteria. For example, Gershoff’s meta-analysis
which inferred a causal relation between physical discipline and
various negative outcomes, did not meet these criteria well; yet
social policy implications were quickly and widely drawn.
In order to more adequately test whether there exists evidence to
support a principled argument that physical discipline causes the
child attributes with which it is linked, Baumrind and Owens, using
archival data from Baumrind’s Family Socialization and Developmental
Competence longitudinal program of research conducted a study of
children ages 4, 9 and 14 years, and their parents (based on 50
hours of observation and interviews) which met all the requisite
and desirable criteria previously summarized. For example, parents
who used physical punishment abusively (“red zone” families)
were separated from those whose use was normative in frequency and
intensity; a measure of initial child misbehavior was partialed
out; a plausible third variable, namely a reliable measures of consistent,
responsive discipline was covaried out; and reliable and valid measures
of parent and child measures obtained from independent sources were
employed. Before the “red zone” families were removed
the results looked similar to those of Gershoff. However, once these
very high risk families were removed and plausible third variables
covaried out correlations were close to zero. Furthermore, the link
between a measure of verbal discipline was at least as large as
that of physical discipline and in addition physical discipline
added no significant variance to that of the parent typology.
In sum, there was no evidence to support inferences from the correlational
studies in Gershoff’s meta-analysis to suggest that mild to
moderate spanking is associated with negative outcomes. Our research
illustrates the importance of testing plausible rival hypotheses
and keeping a very good longitudinal data set such that potentially
relevant third variables pertaining to alternative hypotheses can
be entered into statistical models first, and intervention selection
biases can be ruled out (e.g., spanking may be a “marker”
for other negative relational factors—not the “cause”
of negative outcomes). Our research also highlights the importance
of having reliable and valid measures of dependent, independent,
and third variables—so that it may be possible to statistically
control for these in prospective designs.