Flaws distort review findings

Systematic reviews can provide more reliable answers than individual studies, offering the possibility of combining findings through meta-analysis. The number of such analyses has increased tenfold in two decades, and confidence in the results is high. However, many of the analyses are not carried out correctly and fail to meet basic quality standards.

Reading time approx. 7 minutes Published: January 11, 2022 Publication type:

Medical and Social Science & Practice

The SBU newsletter presents and disseminates the results of the SBU reports, describes ongoing projects at the agency, informs about assessment projects at sister organisations, and promotes interest in scientific assessments and critical reviews of methods in health care and social services.

Contact SBU

Published: January 11, 2022 https://www.sbu.se/vopeng1_2022

Syntheses of research findings occur in many disciplines [1] and have even become a separate field of research. Systematic reviews can provide valuable knowledge, such as in cases where individual studies are too small to provide reliable results. An overall picture is often more accurate.
An important statistical tool for conducting such work is meta-analysis (see sidebar). One advantage of combining several observations through meta-analysis is to increase statistical power, which makes it possible to demonstrate even minor differences in effect with acceptable statistical confidence – for example, a small but important difference in treatment efficacy between two methods.

But the purpose of meta-analysis is not always to mathematically synthesise the results. Sometimes the purpose is to investigate how the results of different studies vary. [2] In cases where this is the main reason, or when researchers focus on broad generalisations involving many different groups, the analysis may intentionally include studies from completely different categories of participants. [2]
In any event, meta-analysis is a tool that must be correctly and knowledgeably applied. And along with its rapid rise in popularity, a growing number of researchers are sounding the alarm regarding its careless misuse. [3,4] The overall picture will be misleading if aggregation and analysis of the findings of the studies are incorrectly handled. Moreover, because the methodology is so complex, there is also a risk of intentional manipulation. [3,4]
Consequently, systematic reviews using meta-analysis must be subjected to at least as careful scrutiny as other types of research – possibly even more, given that claims of validity are often greater.
For starters, not all compilations that are called meta-analyses truly meet the necessary criteria. For example, simply counting the number of studies that are “for” or “against” an intervention is not a meta-analytical method and may be directly misleading. Nevertheless, this type of “vote counting” is found in reviews.3 For example, some authors may try to substantiate their assumptions by counting the number of studies with statistically significant and non-significant results. But the finding that significant results outweigh non-significant results hardly constitutes evidentiary support.
One challenge in meta-analysis is to select a suitable model – fixed or random effects. The choice depends on the purpose of the analysis and how similar the participants in the various studies are deemed to be. If the participants are sufficiently similar, each study´s group of subjects can be thought of as a random sample of the larger population under investigation. In such cases, a synthesis of results contributes to achieving a clearer picture of the population at large, and the fixed effect model is used. However, should the studies differ to the point that participants can be considered to represent different populations, a random effects model should be used instead. In the latter case, the analysis results correspond to an average effect across all populations, which of course may deviate from the actual effect in a single population.
Meta-analysis also requires a review and ranking of data before they are synthesised. Well-established statistical methodology must be used when calculating effect size, weighting results from different studies and addressing any heterogeneity in the data.
Results are often weighted based on the width of the confidence intervals. The purpose is to be able to distinguish the uncertainty in individual studies from the uncertainty associated with the collective results. [2] Without weighting, it becomes difficult to assess how “robust” the aggregate results of the meta-analysis are as a whole, and how dependent they may be on certain included studies. Weighting also prevents small studies from having too much influence on the collective results (in the fixed effect model), which can otherwise be a problem – for two reasons.

One is that small studies are inherently more sensitive to random errors. The fewer observations made in a study, the greater the latitude for randomness. Studies with few participants are more sensitive to random effects – the results will vary more than in larger studies. [2]
Secondly, it is known that publication of small clinical trials with negative outcomes tend to be delayed or, in the worst-case scenario, fail to be published at all, in which case the findings remain unknown. This skews the overall picture of treatment efficacy, resulting in publication bias. [2] In fields of research dominated by small treatment studies, the overall picture of the beneficial effects of treatment therefore tends to be exaggerated.
Formerly a scarcity in the research literature from the 1990s, scientific journals are now veritably flooded with results from meta-analyses, many of which have been criticised as redundant, erroneous, or both. [4,5] The tendency for researchers to be opinionated regarding substantive issues may bias results, but this is hardly unique to meta-analysis. As with other approaches, researchers must make choices which may affect results. [4 ] Researchers must decide what types of studies to cover, how old they may be and what languages to include. The quality criteria used to cull studies may also vary in regard to both stringency and application.

For this reason, the scientific community must remain vigilant that researchers disclose their choices and explain their process. Authors must openly and clearly explain and motivate their decisions (transparency in reporting) in order for a meta-analysis to be considered reliable.
Technological developments in the field, such as machine learning and artificial intelligence, pose both opportunities and challenges. Broad access to advanced statistical analytical tools allows an ever-growing number of researchers to carry out increasingly complex calculations – without necessarily themselves possessing the knowledge or statistical expertise to do so. The more convoluted the analyses, the more difficult it becomes for researchers, reviewers and others to discover errors and detect bias.
One example is network meta-analysis – an advanced analytical method that is becoming increasingly common and which can easily yield erroneous findings. [6] This type of meta-analysis compares three or more treatments by combining both direct and indirect comparison results from various trials. While traditional meta-analysis only makes direct comparisons between interventions, network meta-analysis also makes indirect comparisons – including interventions that were never tested side by side within one and the same trial. In order to also compare interventions that were never tested directly head to head, effect estimates from trials that share a common comparator are used. For example, when A vs B is the comparison of interest, randomised trials on A vs C and on B vs C are used as indirect evidence. A large network meta-analysis may include more than 20 comparisons.

The extent to which use of network meta-analysis can at all be considered appropriate once again depends on how similar the studies are. Such an assessment requires knowledge of the subject and affects choice of statistical methodology – where the options are many. Various draft review templates for network meta-analysis have been published. [7-10]
An array of pitfalls must be avoided when conducting and interpreting meta-analyses, ranging from simple to highly complex. While meta-analysis has proven valuable as a statistical tool, it is often used incorrectly. A large proportion of published analyses have been deemed substandard. [4]
It is paramount to remember that meta-analytic tools in themselves are by no means a guarantee of quality. RL

References

1. Gough D, et al. Syst Rev. 2020;9:155.
2. Gurevitch J, et al. Nature 2018;555:175-82.
3. de Vrieze J. Science 2018;361:1184-8.
4. Ioannidis JPA. Milbank Q, 2016;94:485-514.
5. Leclercq V, et al. BMJ Open 2020;10:e036349.
6. Anttila S. SBU, Vetenskap & praxis, 2018:(1-2):12-3.
7. Nikolakopoulou A, et al. PLoS Med 2020;17:e1003082.
8. Puhan MA, et al. BMJ 2014;349:g5630.
9. Jansen J, et al. Value Health 2014;17:157-73.
10. Brignardello-Petersen R, et al. BMJ 2020;371:m3907

META-ANALYSIS

Statistical analysis method to quantitatively synthesise findings from primary studies of the same diagnostic method or intervention. The method is frequently used in the context of systematic reviews and follows a previously determined process.

An exhaustive literature search is used to obtain all available research concerning the questions to be answered. The material is sorted and culled, after which it is reviewed according to previously determined criteria and then synthesised to produce an aggregate result with associated confidence intervals.

Larger studies with greater numbers of participants and clinical events are given higher weight in the final analysis.

The analysis provides an overview of the available results and how consistent they are. Historically, the first meta-analysis was carried out in 1904, but the method did not become established until the 1990s.

Meta-analysis itself does not inherently assess the risk of bias; instead this is estimated later in a separate evidence grading process.

INTERNATIONAL STANDARDS

AMSTAR – checklist for assessing the methodological quality of systematic reviews at the overarching level (not for individual outcomes) https://amstar.ca
ROBIS – tool for assessing the risk of bias in systematic reviews
https://www.bristol.ac.uk/population-health-sciences/projects/robis/robis-tool/
MECIR and MECCIR – Standards for the conduct and reporting of systematic reviews from Cochrane and Campbell Collaboration https://community.cochrane.org/mecir-manual
PRISMA – basic requirements of scientific journals and publishers on how to report systematic reviews and meta-analyses http://www.prisma-statement.org
RAMESES – UK project to produce standards and tools in a qualitative approach to assess the reporting of systematic reviews https://www.ramesesproject.org

WELL-CONDUCTED SYSTEMATIC REVIEWS – IDENTIFYING CHARACTERISTICS

Study choice matches the aim

The aim of the review was determined in advance, as were criteria for inclusion of studies.
Selection of studies is commensurate with the question to be answered by the review.
The selection criteria are clear and take into account the currency, size and quality of the studies, as well as relevance of outcomes.
The selection takes into account the source, e.g. type of publication, language and availability of raw data.
A list of the studies that were not included in the compilation.

Thoroughness of literature search

The search covers suitable databases and other important sources.
Search terms and phrases are formulated to identify the greatest possible number of relevant studies.
Constraints regarding year, type and language of publication are clearly and appropriately disclosed.
Special measures were taken to minimise the risk of biased study selection. Experts in the field were consulted.

Critical review of studies

Special measures were taken to avoid errors when collecting data from the studies. Participants, interventions and treatments are described in detail.
Review authors have sufficient information and knowledge to interpret the data.
All relevant outcomes are included and reported in the compilation.
A structured approach with appropriate criteria was used to assess the risk of bias in the results, and the conclusions are clearly supported.
Special measures were taken to avoid erroneous assessment of the risk of bias and to resolve disagreements regarding the assessment.

Accuracy in compilation

The review includes all studies that meet the predetermined criteria and describes the relevance of all studies in relation to the question that the review aims to answer.
All predetermined analyses are presented and deviations, if any, are explained.
Choice of analytical model is justified. Studies from which the findings are compiled are deemed to be sufficiently similar concerning question to be answered, design and outcome measures. Disparities among studies, if any, are appropriately managed.
Aggregate results are sufficiently robust to stand up to sensitivity analysis, and the risk of biased publication of studies has been taken into account and assessed using different methods.
The weaknesses identified in the studies are taken into account in the conclusions of the review. The risk of bias in these conclusions and in the interpretation of findings by the authors were appropriately described and addressed. The authors do not present only statistically significant findings, but report all outcomes. Sources of funding for the review are disclosed.

Sources: Whiting P, et al. ROBIS: A new tool ... J Clin Epidemiol. 2016;69:225-34 and SBU’s Handbook

Page published January 11, 2022

To top