‘Correlation’ does not equal ‘causation’

The frequent occurrence of a certain factor together with a problem is not evidence that it is the cause of the problem – much less that elimination of the factor would cause the problem to disappear.

Reading time approx. 8 minutes Published: Publication type:

Medical and Social Science & Practice

The SBU newsletter presents and disseminates the results of the SBU reports, describes ongoing projects at the agency, informs about assessment projects at sister organisations, and promotes interest in scientific assessments and critical reviews of methods in health care and social services.

Not all correlations are causal. Factors that occur together with a health condition or problem, and that are statistically linked (associated or correlated) with the problem are often referred to as either risk factors or protective factors.

Sometimes research findings show that individuals who have a particular risk factor are also at higher risk of developing a certain condition or problem. Therefore, the presence of the risk factor in an individual predicts with a certain probability that they also have the health condition, or that they will develop it. An association is present.

Such associations, however, are often misinterpreted. Indeed, it may not be at all clear that the particular factor causes the condition. The demonstrated association could be causal (causal relationship), but must not necessarily be so.

To determine whether causality is involved, it is helpful to devise studies in which the believed cause can be manipulated, where its impact on the condition can be investigated. However, many ethical aspects must be taken into account in such studies. Sometimes human trials are clearly inappropriate. One example of an unethical study design would be to subject individuals who have never smoked to an intervention that is thought to induce a higher rate of smoking in the future; in other words, an intervention that is suspected of being harmful.

In such cases, researchers are instead relegated to conducting studies in which suspected harmful exposure occurs naturally in a group. Researchers can choose to study the health of the participants before and after exposure, or to explore whether the suspected negative health outcome arises more frequently among participants who have poor health. However, when considering whether an association demonstrated in such studies may be causal, several other circumstances must be taken into account.[1,2]

One key issue is the time perspective – whether the condition arose before exposure to the risk factor – in which case it is impossible for this factor to be the cause. The problem is that in many research studies it is difficult to determine which actually came first. For example, this could pertain to cross-sectional studies that investigate whether people with a particular disease were also exposed to a suspected risk factor, compared with a healthy control group. In such studies, it is extremely difficult to determine whether exposure to the suspected factor actually preceded the disease. Instead, studies are needed that follow participants long enough for the condition to develop.

Moreover, it is necessary to rule out the presence of other causes which are common to both exposure and outcome; in other words, to rule out the presence of systematic error due to confounding factors, confounders or ‘lurking variables’, causing spurious correlations.

Yet another issue that is often taken into account is the strength of the association (e.g. how often the risk factor and the problem occur together). The underlying reasoning is that the stronger the association, the more likely a causal relationship should be.

But this is far from certain. British epidemiologist Sir Austin Bradford Hill, widely recognised for his early ideas concerning causality (Hill 1965), pointed out that even weak associations may occur between cause and effect. He argued that it is unlikely that a strong association arises solely as a result of unknown underlying factors, measurement errors and selection errors. Were this to occur, the impact of the errors must be at least as strong as the association itself, and this is not usually the case, according to Bradford Hill.

However, others have pointed out that strong associations can also arise when statistical analyses are based on erroneous assumptions.

A further aspect to be considered is whether variation in the intensity of exposure (dose) and the magnitude of the problem (response) seem to correspond. If greater exposure to a potential ‘cause’ is always followed by a greater ‘effect’, the causal inference is strengthened.

The same reasoning applies when a conceivable mechanism is found that could explain how the risk factor gives rise to the problem. Moreover, there may be experimental data to support causality – such as animal studies or studies that elucidate a mechanism of action.

In the case of human behaviour, making causal inferences is particularly difficult. This may be due in part to insufficient knowledge about the chain of events, emotions and thoughts that preceded a particular behaviour. For example, why do people start drinking alcohol and smoking cigarettes? Could it be that smoking triggers alcohol consumption – or vice versa? It is easy to envisage that there be many conceivable causes that are related through complex interactions.

In the current SBU assessment concerning associations involving e-cigarettes, ‘snus’ (moist tobacco) and smoking tobacco, the question becomes even more complicated since both exposure (use of e-cigarettes and snus) and results (tobacco smoking) are self-assessed and imprecise measurements.

In correlation studies, a particularly important challenge involves identifying and managing underlying factors that ‘confound’ the association that researchers actually want to investigate. While researchers must always take such confounding factors into account, it is not always obvious which ones are meaningful, nor is it certain that researchers have any information about them. The challenge is to avoid over- or under-estimating the impact of the confounding factors. Under-estimation may lead to spurious associations, while over-estimation could conceal associations that are actually present.

When two occurrences often coincide, but in a different sequence on different occasions, it could be an indication that a confounding factor underlies the association. Should a study on a group of participants show that behaviour A precedes behaviour B among many participants, while many other participants demonstrate behaviour B prior to behaviour A, the association between A and B could be due to a common confounding factor.

For example, tobacco researchers have noted that snus users are more likely to eventually begin smoking cigarettes than are non-users. Similarly, cigarette smokers are more likely to begin using snus than are non-smokers. [3,4] In such cases, it may be important to consider the possibility of a third factor underlying both behaviours, such as the propensity to experiment with substances or to develop dependence.

Researchers usually have more confidence in a finding when it is replicated in different studies using one and the same design. However, it is important to remember that replication in itself is not proof of causality. The association may still be the result of similar studies repeatedly overlooking the same systematic error.

Meanwhile, causal inferences are strengthened when results from quite different types of studies overall point in the same direction – animal studies and mechanistic studies, as well as epidemiological and clinical studies of different designs. The investigation of a single question using several different approaches is referred to as triangulation. [5]

When discussing causality, it is ultimately important to avoid confusing necessary cause (that a certain factor is required to trigger a certain effect) with sufficient cause (that this factor by itself is sufficient cause). Even when one factor is necessary as a cause, other concurrent circumstances may be required to trigger an effect. As an example: consider two siblings who both inherit a trait for a hereditary disease, but only the one who is exposed to a particular environmental factor actually develops the disease. This hereditary factor was a necessary but not sufficient cause.

Such complexity is common. All factors that appear to be causes are not necessarily so, while many of the problems encountered in health care and social services have an array of interacting causes. Such problems are multifactorial.

A better understanding of causal relationships is crucial – especially when devising measures for scientific testing to ensure that it can be determined that the effects are truly those that were intended.

Lotta Ryk, Project Manager SBU
lotta.ryk@sbu.se
Ragnar Levi, Editor-in-Chief SBU

References

  1. Hill AB. The environment and disease: Association or causation?. Proc R Soc Med 1965;58:295-300.
  2. Howick J, et al. The evolution of evidence hierarchies... J R Soc Med 2009;102:186-94.
  3. Haukkala A, et al. Progression of oral snuff use ...Addiction 2006;101:581-9.
  4. Galanti MR, et al. Between harm and dangers. Oral snuff use ... Eur J Public Health 2001;11:340-5.
  5. Munafò MR, et al. Robust research needs many lines of evidence. Nature 2018;553:399-401.

CORRELATIONS ARE SEEN WHEN …
>… two events are linked because one actually causes the other

>… the two events are caused by a third underlying factor – known as a confounding factor (confounder)

>… measurement or selection errors skew the results – information concerning the events is erroneous or mistakes were made when selecting study participants

>… by chance, two events accidentally happen to covary.

 

A CAUSAL LINK MAY BE PRESENT WHEN …
>… the suspected causal factor always precedes the effect, which also occurs within a reasonable time interval. However, a delay between cause and effect may be difficult to establish through retrospective studies.

>… the magnitude of dose and response correspond – the stronger the ‘cause’, the greater the ‘effect’. However, a common underlying confounding factor must be ruled out as an explanation for the link.

>… the mechanism of action is theoretically feasible and consistent with known facts. However, many important mechanisms of action are not yet known. What appears to be unreasonable today may be commonly accepted tomorrow. 
>… experimental data provide support for a causal relationship. Studies in which the ‘effect’ increases or decreases when the suspected ‘cause’ is added or removed in laboratory studies may strengthen the likelihood.

>… the association is strong. However, some strong associations are due to incorrect statistical analyses, and a weak association does not rule out causality.

>… the finding is consistent, i.e., replicated by different researchers in different contexts at different points in time and in studies with different designs. However, replication may also be due to systematic errors in study design or conduct.

Page published