What predictions lead to better decisions?

Accurate predictions can help guide healthcare interventions. But some predictive modelling is misleading or otherwise inadequate. Risk and benefit must be considered before using models and algorithms for important decisions.

Reading time approx. 10 minutes Published: Publication type:

Medical and Social Science & Practice

The SBU newsletter presents and disseminates the results of the SBU reports, describes ongoing projects at the agency, informs about assessment projects at sister organisations, and promotes interest in scientific assessments and critical reviews of methods in health care and social services.

crystal ballMany decisions in health care and social services are based on assumptions about the future – how a disease or social problem will unfold. What happens if nothing is done – what is the risk for the individual? What assistance and interventions does the individual need?

A prognosis is an assessment of the likely course of a condition in an individual who has certain characteristics or who lives under certain circumstances. Many predictions are based on systematic observations of groups of individuals whose situation is similar, frequently obtained from large registry studies where many people were followed over time and in which a few developed the given condition.

The purpose of a prediction can be purely informative – to gain knowledge concerning the risk that an individual will become afflicted by the given condition – or to provide a better foundation for decision-making in order to have an effect on the situation. In either case, the prediction should be as accurate as possible in order to provide a correct image of the future.

The anticipated course of various conditions can be calculated using sets of mathematical instructions – algorithms – in which the various circumstances are combined and weighted in an effort to predict a certain condition in individuals, either short term or long term.

In the field of medicine, a number of prediction models have been developed; for example to assess the risk that an individual will develop cardiovascular disease. New models are constantly being published – but many are plagued by methodological problems and severe unreliability. Their accuracy may never have been compared with earlier models, and patient benefit may never have been demonstrated – let alone the associated risks.

The more veracious and detailed the prediction, the more knowledge it provides and the better the decisions that should result. But when predictions are wrong, the consequences could be devastating. And this applies both to individuals and to groups. This is why it is so important to understand critical questions that must be answered before relying on predictions.

One aspect that may seem surprising is that prediction models can be accurate, even when the underlying causes of a condition to be predicted are unknown. In other words, although the aetiology of the condition remains unclear, it is still entirely possible to develop a model that provides accurate predictions – provided that the model is based on a sufficient number of correct observations and has been analysed using correct statistical methodology.

However, a model constructed in this way cannot provide information as to what interventions are helpful. To do so would require efficacy studies that are able to distinguish causal factors from background and confounding factors.
Researchers who design a prediction model based on observational studies must avoid including irrelevant factors that simply occur by chance, along with the condition. Inclusion of the latter result in ‘noise’ that actually weakens the predictions as soon as the model is applied outside the framework of the studies.

Advanced mathematical prediction models are often extremely sensitive. If the algorithm was developed and tested to make forecasts for a certain category of people, in a particular environment, it is far from certain that this model’s predictions will be correct in another, similar context. Furthermore, similar algorithms may provide divergent predictions in the exact same setting. One common example is weather forecasts. Such predictions may be disparate and more or less reliable, depending on the models used by the different providers.

The main question is whether patients and users truly benefit from a certain algorithm when used as a basis for decision-making in health care and social services. The only way to approach this issue is to first test the model to ensure that it makes accurate predictions, and then study the model in practice to investigate the effects, both beneficial and harmful.

Various scientific requirements must be met. First, the mathematical model must be derived from accurate and complete data covering a large number of observations. Many diseases and conditions are multifactorial – the course for the individual is impacted by many factors and their context. In such cases, the algorithm must take many factors into account in order to yield correct and adequately detailed predictions.

In order to avoid systematic errors, all factors that influence the prediction must be entered completely independent of the outcome. Data collection should be the same for all individuals regardless of their future prospects. Should data collection be influenced by the outcome, it could lead to erroneous predictions. For example, such a situation may arise when patients who appear to be in worse health are examined more thoroughly than others.
It is also important to test the predictive accuracy of the model in a context similar to the setting for which it is intended, such as within the same category of patients or service users. The model should also be tested in various sub-groups. In this way it can be calibrated and adjusted to not just accurately forecast the group average, but to also provide correct predictions for as many individuals as possible.

Before it can be concluded that the model meets at least basic requirements, it must be subjected to various statistical tests (such as cross-validation and bootstrap), a process known as internal validation. An assessment should also be made as to whether the predictions are equally accurate for other categories of patients and users, and in settings other than the one in which the algorithm was developed. This process is known as external validation. Many models published in scientific journals have not been sufficiently validated to be considered reliable.

The studies on which the model is based, and that are used to test it, must apply uniform definitions and limitations of the conditions that the model is intended to predict. For example, should diagnostic criteria have varied over time or between countries, the results may be erroneous. The risk of non-uniform application of diagnostic criteria may be particularly high when these rely solely on a single practitioner’s subjective assessment, without the support of objective measures.

Continual monitoring of the prediction model’s accuracy is necessary. For example, every case of predictive failure should be assessed and used as input data to adjust the model accordingly. When such feedback and adjustment is automatic, it is referred to as machine learning or self-learning systems.

Whenever there is a risk that an inaccurate prediction could lead to serious consequences such as severe injury or death, the algorithm must be designed to warn even at the slightest indication.

Yet another requirement is that the prediction model must describe the certainty of each prediction. An algorithm that provides precise though often erroneous predictions may be considerably less useful than a model that provides broader but more reliable predictions. The manner in which uncertainty is conveyed to users may hold great importance. Decision-makers must understand the degree of reliability of the calculation and take this into account.

Last but not least, the model must be useful in practice, provide more benefit than harm and be worth its price. For example, should the model require too much input of information that is not readily available, it may become useless in practice.

A prediction model must be neither overly optimistic (failing to predict the condition), nor too pessimistic (giving false alarms). Alarmist predictions risk generating anxiety and may lead to unnecessary measures. In contrast, overly optimistic predictions may create a false sense of security, which may also have serious consequences.

It is well to remember that even when a condition can be accurately predicted using an algorithm based on solid observational evidence, such an algorithm does not tell us what interventions are effective, safe and cost-effective for this condition. This would require a different type of study. • RL

Further reading

  1. Challen R, et al. Artificial intelligence, bias and clinical safety. BMJ Qual Saf 2019;28:231-7.
  2. Vollmer S, et al. Machine learning and artificial intelligence ... BMJ 2020;368:l6927.
  3. Fall K, et al. Bra prognosstudier kan ge bättre kliniska beslut. Läkartidningen 2013;110:279-83.
  4. Foroutan F, et al. Use of GRADE for the assessment of evidence about prognostic factors ... J Clin Epidemiol 2020;121:62-70.
  5. Damen JAAG, et al. Prediction models for cardiovascular disease risk in the general population: systematic review. BMJ 2016;353:i2416.
  6. Riley RD, et al. A guide to systematic review and meta-analysis of prognostic ... BMJ 2019;364:k4597.
  7. Collins GS, et al. The TRIPOD Statement. BMJ 2014;350:g7594.

Algorithms

Instructions to carry out calculations or to answer questions have been used in health care for purposes such as risk assessment for osteoporotic fractures, death in the ICU, and death due to coronary heart disease among individuals with hypertension or high cholesterol.

Vigorous research is currently underway to assess the benefit of algorithms for diagnostics and treatment. Researchers are exploring the role of machine learning with automatic feedback in order to ‘train’ the model to make more reliable predictions. When such models trigger automated actions within a closed system, they are referred to as ‘intelligent’ robots. Examples in medicine include insulin pumps and completely self-regulating ventilators which over time, become better adapted to the individual.

However, the efficacy, safety, costs and ethical consequences must undergo scientific scrutiny.

WAS THE ALGORITHM DESIGNED WITH ATTENTION TO …

… appropriate statistical analysis methodology?
… sufficient number of observed cases?
… correct handling of continuous and discrete variables?
… analysis of all individuals in the study/register?
… appropriate handling of attrition?
… consideration of complexity (e.g. competing risks)?
… testing predictions concerning who in the material is affected and who is not?
… testing such predictions for various sub-groups in the material?
… avoidance of over-adaptation of the model?
… consideration in relation to analysis of several variables simultaneously?

 

HOW WELL DOES THE PREDICTION MODEL WORK IN PRACTICE?

How clear and transparent is the prediction model?

  •  Is the model’s approach to making predictions comprehensible?
  • Does the model explain how reliable the predictions are?
  • How is uncertainty, if any, communicated to users so as to avoid over-confidence in the results?

Is the prediction model used correctly?

  • Was the model tested in different settings, using different data sources and in different patient or user groups?
  • How do we know that the test environment for the model corresponds to the actual setting where the model is used and that it does not convey an erroneous impression?
    – Do the categories objectively reflect defined outcomes, or are they dependent on subjective assessments?
    – How often have the predictions of the model been accurate in the developmental environment?
    – Have the developers allowed leeway for the model to provide more or less correct predictions in various sub-groups of patients/users?
    – Will the model be used for the same type of assessments in the same type of contexts in which it was developed?
  • How will the quality of the predictions be checked and how will the model be adapted based on these findings?

Does the prediction model entail risk to the individual?

  • Does the model give sufficient consideration to the risk of serious consequences – is the precautionary principle applied in the predictions?
  • Can the model recognise atypical, deviating data and handle them reliably in regard to the individual?

Does the prediction model contribute to better decisions?

  • Does the model lead to better outcomes for patients and users? And to improved resource management? Or is there a risk that the model will lead to unnecessary or ineffective measures?
  • What ethical consequences will the predictions have for the affected groups? Is use of the methodology consistent with generally accepted ethical principles? 
  • What type of decisions are reinforced by the model? 
  • Is there a risk that erroneous predictions affect decision-making so as to ensure verification of the prediction?

 

Page published