GRADE approach
View on WikipediaThe GRADE approach (Grading of Recommendations Assessment, Development and Evaluation) is a method of assessing the certainty in evidence (also known as quality of evidence or confidence in effect estimates) and the strength of recommendations in health care.[1] It provides a structured and transparent evaluation of the importance of outcomes of alternative management strategies, acknowledgment of patients and the public values and preferences, and comprehensive criteria for downgrading and upgrading certainty in evidence. It has important implications for those summarizing evidence for systematic reviews, health technology assessments, and clinical practice guidelines as well as other decision makers.[2]
Background and history
[edit]The GRADE began in the year 2000 as a collaboration of methodologists, guideline developers, biostatisticians, clinicians, public health scientists and other interested members. GRADE developed and implemented a common, transparent and sensible approach to grading the quality of evidence (also known as certainty in evidence or confidence in effect estimates) and strength of recommendations in healthcare.[3][4]
GRADE components
[edit]The GRADE approach separates recommendations following from an evaluation of the evidence as strong or weak. A recommendation to use, or not use an option (e.g. an intervention), should be based on the trade-offs between desirable consequences of following a recommendation on the one hand, and undesirable consequences on the other. If desirable consequences outweigh undesirable consequences, decision makers will recommend an option and vice versa. The uncertainty associated with the trade-off between the desirable and undesirable consequences will determine the strength of recommendations.[5] The criteria that determine this balance of consequences are listed in Table 2. Furthermore, it provides decision-makers (e.g. clinicians, other health care providers, patients and policy makers) with a guide to using those recommendations in clinical practice, public health and policy. To achieve simplicity, the GRADE approach classifies the quality of evidence in one of four levels—high, moderate, low, and very low:
Quality of evidence
[edit]GRADE rates quality of evidence as follows:[6][7]
| High | There is a lot of confidence that the true effect lies close to that of the estimated effect. |
| Moderate | There is moderate confidence in the estimated effect: The true effect is likely to be close to the estimated effect, but there is a possibility that it is substantially different. |
| Low | There is limited confidence in the estimated effect: The true effect might be substantially different from the estimated effect. |
| Very low | There is very little confidence in the estimated effect: The true effect is likely to be substantially different from the estimated effect. |
The GRADE working group has developed a software application that facilitates the use of the approach, allows the development of summary tables and contains the GRADE handbook. The software is free for non-profit organizations and is available online.[8] The GRADE approach to assess the certainty in evidence is widely applicable, including to questions about diagnosis,[9][10] prognosis,[11][12] network meta-analysis[13] and public health.[14]
Strength of recommendation
[edit]Factors and criteria that determine the direction and strength of a recommendation:
| Factor and criteria* | How the factor influences the direction and strength of a recommendation |
|---|---|
| Problem
This factor can be integrated with the balance of the benefits and harms and burden. |
The problem is determined by the importance and frequency of the health care issue that is addressed (burden of disease, prevalence or baseline risk). If the problem is of great importance a strong recommendation is more likely. |
| Values and preferences | This describes how important health outcomes are to those affected, how variable they are and if there is uncertainty about this. The less variability or uncertainty there is about values and preferences for the critical or important outcomes, the more likely is a strong recommendation. |
| Quality of the evidence | The confidence in any estimate of the criteria determining the direction and strength of the recommendation will determine if a strong or conditional recommendation is offered. However, the overall quality that is assigned to the recommendation is that of the evidence about effects on population-important outcomes. The higher the quality of evidence the more likely is a strong recommendation. |
| Benefits and harms and burden | This requires an evaluation of the absolute effects of both the benefits and harms and their importance. The greater the net benefit or net harm the more likely is a strong recommendation for or against the option. |
| Resource implications | This describes how resource intense an option is, if it is cost-effective and if there is incremental benefit. The more advantageous or clearly disadvantageous these resource implications are the more likely is a strong recommendation. |
| Equity
This factor is often addressed under values preferences, and frequently also includes resource considerations |
The greater the likelihood to reduce inequities or increase equity and the more accessible an option is, the more likely is a strong recommendation. |
| Acceptability
This factor can be integrated with the balance of the benefits and harms and burden. |
The greater the acceptability of an option to all or most stakeholders, the more likely is a strong recommendation. |
| Feasibility
This factor includes considerations about values and preferences, and resource implications. |
The greater the acceptability of an option to all or most stakeholders, the more likely is a strong recommendation. |
- Factors for which overlap is described are often not shown separately in a decision table.
Usage
[edit]Over 100 organizations (including the World Health Organization,[15] the UK National Institute for Health and Care Excellence (NICE), the Canadian Task Force for Preventive Health Care, the Colombian Ministry of Health and Social Protection,[citation needed] and the Saudi Arabian Ministry of Health[16]) have endorsed and/or are using GRADE to evaluate the quality of evidence and strength of health care recommendations.[citation needed]
Criticism
[edit]When used to summarize evidence from nutritional science, dietary, lifestyle, and environmental exposure, the use of the GRADE approach has been criticized. That is because the GRADE system only allows for randomized controlled trials (RCT) to be rated as high evidence and rates all observational studies as low evidence because of their potential for confounding. This dismisses the strength of observational studies when it comes to long-term effects of dietary and lifestyle factors and does not reflect the key limitations that RCTs have when it comes to long-term effects.[17][18] One example of a slowly progressing disease that should preferably be studied with observational studies but not RCTs is atherosclerosis.[19]
References
[edit]- ^ Schünemann, HJ; Best, D; Vist, G; Oxman, AD (2003). "Letters, numbers, symbols, and words: How best to communicate grades of evidence and recommendations?". Canadian Medical Association Journal. 169 (7): 677–80.
- ^ Guyatt, GH; Oxman, AD; Vist, GE; Kunz, R; Falck-Ytter, Y; Alonso-Coello, P; Schünemann, HJ (2008). "GRADE: an emerging consensus on rating quality of evidence and strength of recommendation". BMJ. 336 (7650): 924–26. doi:10.1136/bmj.39489.470347.ad. PMC 2335261. PMID 18436948.
- ^ Guyatt, GH; Oxman, AD; Schünemann, HJ; Tugwell, P; Knotterus, A (2011). "GRADE guidelines: A new series of articles in the Journal of Clinical Epidemiology". Journal of Clinical Epidemiology. 64 (4): 380–382. doi:10.1016/j.jclinepi.2010.09.011. PMID 21185693.
- ^ "GRADE home". Gradeworkinggroup.org. Retrieved 16 August 2019.
- ^ Andrews, J; Guyatt, GH; Oxman, AD; Alderson, P; Dahm, P; Falck-Ytter, Y; Nasser, M; Meerpohl, J; Post, PN; Kunz, R; Brozek, J; Vist, G; Rind, D; Akl, EA; Schünemann, HJ (2013). "GRADE guidelines: 15. Going from evidence to recommendations: the significance and presentation of recommendations". Journal of Clinical Epidemiology. 66 (7): 719–725. doi:10.1016/j.jclinepi.2012.03.013. PMID 23312392.
- ^ Balshem, H; Helfand, M; Schünemann, HJ; Oxman, AD; Kunz, R; Brozek, J; Vist, GE; Falck-Ytter, Y; Meerpohl, J; Norris, S; Guyatt, GH (April 2011). "GRADE guidelines 3: rating the quality of evidence - introduction". Journal of Clinical Epidemiology. 64 (4): 401–406. doi:10.1016/j.jclinepi.2010.07.015. PMID 21208779.
- ^ Reed Siemieniuk and Gordon Guyatt. "What is GRADE?". BMJ Best Practice. Retrieved 2020-07-02.
- ^ "GRADEpro". Gradepro.org. Retrieved 16 August 2019.
- ^ Schünemann, HJ; Oxman, AD; Brozek, J; Glasziou, P; Jaeschke, R; Vist, G; Williams, J; Kunz, R; Craig, J; Montori, V; Bossuyt, P; Guyatt, GH (2008). "GRADEing the quality of evidence and strength of recommendations for diagnostic tests and strategies". BMJ. 336 (7653): 1106–1110. doi:10.1136/bmj.39500.677199.ae. PMC 2386626. PMID 18483053.
- ^ Brozek, JL; Akl, EA; Jaeschke, R; Lang, DM; Bossuyt, P; Glasziou, P; Helfand, M; Ueffing, E; Alonso-Coello, P; Meerpohl, J; Phillips, B; Horvath, AR; Bousquet, J; Guyatt, GH; Schünemann, HJ (2009). "Grading quality of evidence and strength of recommendations in clinical practice guidelines: part 2 of 3. The GRADE approach to grading quality of evidence about diagnostic tests and strategies". Allergy. 64 (8): 1109–16. doi:10.1111/j.1398-9995.2009.02083.x. PMID 19489757. S2CID 8865010.
- ^ Iorio, A; Spencer, FA; Falavigna, M; Alba, C; Lang, E; Burnand, B; McGinn, T; Hayden, J; Williams, K; Shea, B; Wolff, R; Kujpers, T; Perel, P; Vandvik, PO; Glasziou, P; Schünemann, H; Guyatt, G (2015). "Use of GRADE for assessment of evidence about prognosis: rating confidence in estimates of event rates in broad categories of patients". BMJ. 350: h870. doi:10.1136/bmj.h870. PMID 25775931.
- ^ Spencer, FA; Iorio, A; You, J; Murad, MH; Schünemann, HJ; Vandvik, PO; Crowther, MA; Pottie, K; Lang, ES; Meerpohl, JJ; Falck-Ytter, Y; Alonso-Coello, P; Guyatt, GH (2012). "Uncertainties in baseline risk estimates and confidence in treatment effects". BMJ. 14: 345. doi:10.1136/bmj.e7401. PMID 23152569.
- ^ Puhan, MA; Schünemann, HJ; Murad, MH; Li, T; Brignardello-Petersen, R; Singh, JA; Kessels, AG; Guyatt, GH (2014). "A GRADE Working Group approach for rating the quality of treatment effect estimates from network meta-analysis". BMJ. 24: 349. doi:10.1136/bmj.g5630. PMID 25252733.
- ^ Burford, BJ; Rehfuess, E; Schünemann, HJ; Akl, EA; Waters, E; Armstrong, R; Thomson, H; Doyle, J; Pettman, T (2012). "Assessing evidence in public health: the added value of GRADE". J Public Health. 34 (4): 631–5. doi:10.1093/pubmed/fds092. PMID 23175858.
- ^ "GRADEpro". Gradepro.org. Retrieved 16 August 2019.
- ^ "The Saudi Center for Evidence Based Healthcare (EBHC) - Clinical Practice Guidelines". 2016-02-25. Archived from the original on 2016-02-25. Retrieved 2021-02-19.
- ^ Qian, Frank; Riddle, Matthew C.; Wylie-Rosett, Judith; Hu, Frank B. (2020-01-13). "Red and Processed Meats and Health Risks: How Strong Is the Evidence?". Diabetes Care. 43 (2): 265–271. doi:10.2337/dci19-0063. ISSN 0149-5992. PMC 6971786. S2CID 210841441.
- ^ Parker-Pope, Tara; O’Connor, Anahad (2019-10-04). "Scientist Who Discredited Meat Guidelines Didn't Report Past Food Industry Ties". The New York Times. ISSN 0362-4331. Retrieved 2022-01-05.
- ^ Gan, Zuo Hua; Cheong, Huey Chiat; Tu, Yu-Kang; Kuo, Po-Hsiu (2021-11-05). "Association between Plant-Based Dietary Patterns and Risk of Cardiovascular Disease: A Systematic Review and Meta-Analysis of Prospective Cohort Studies". Nutrients. 13 (11): 3952. doi:10.3390/nu13113952. ISSN 2072-6643. PMC 8624676. PMID 34836208.
GRADE approach
View on GrokipediaHistorical Development
Origins and Motivations
The emergence of evidence-based medicine (EBM) in the early 1990s spurred the development of multiple systems for grading evidence quality and recommendation strength, yet these approaches exhibited significant inconsistencies that hindered guideline comparability. Organizations such as the US Preventive Services Task Force (USPSTF), which employed a letter-based grading system emphasizing study design levels from randomized trials to expert opinion since the 1980s, and the Oxford Centre for Evidence-Based Medicine (OCEBM), which introduced hierarchical levels around 1998 prioritizing randomized controlled trials, often produced divergent ratings for similar evidence bases.[6] Key shortcomings included the conflation of evidence quality—reflecting methodological rigor and precision—with recommendation strength, which should additionally account for benefits, harms, values, and resource use; an over-reliance on rigid study design hierarchies that inadequately incorporated nuances like risk of bias or inconsistency; and opaque processes for adjusting ratings, resulting in low reproducibility of judgments across systems, as demonstrated in appraisals of over 50 guideline-producing organizations.[7][6][8] These limitations motivated the formation of the GRADE Working Group in 2000 by Gordon Guyatt and colleagues, comprising clinical epidemiologists, methodologists, and guideline developers, to create a uniform, transparent framework addressing prior deficiencies, with initial focus on standardizing evaluations for World Health Organization (WHO) guidelines and other international efforts.[1][9][10]Key Milestones and Contributors
The Grading of Recommendations Assessment, Development and Evaluation (GRADE) Working Group originated in 2000 as an informal collaboration among researchers and clinicians seeking to address inconsistencies in existing systems for evaluating evidence quality and recommendation strength.[1] Led by Gordon Guyatt of McMaster University, the group included early contributors such as David Atkins, Drummond Rennie, and Roman Jaeschke, who focused on developing a transparent, explicit framework applicable across guideline contexts. A pivotal milestone occurred in 2004 with the publication of the GRADE Working Group's initial proposal in the BMJ, outlining core principles for grading evidence quality starting from randomized trials as high and observational studies as low, with provisions for upgrading or downgrading based on specified criteria. This was followed by a series of articles in the BMJ in 2008, which refined the approach through consensus among international collaborators, including Holger Schünemann and Phil Alderson, and established GRADE as an emerging standard for systematic evidence assessment.[11] Subsequent developments included the release of the GRADE Handbook in 2013, compiling detailed guidance refined through iterative Working Group meetings involving over 500 members from diverse organizations.[12] By 2011, GRADE achieved formal integration into the Cochrane Collaboration's methodology for producing summary-of-findings tables in systematic reviews, enhancing its adoption in evidence synthesis.[13] Recent advancements encompass 2022 updates to imprecision rating protocols, incorporating minimally contextualized approaches and confidence interval thresholds, alongside guidelines for evaluating modeled evidence certainty, driven by Schünemann and collaborators to address decision-making in complex health scenarios.[14][15]Methodological Framework
Assessing Certainty of Evidence
The GRADE approach evaluates the certainty of evidence for specific outcomes by starting with an initial rating based on study design and then adjusting it through consideration of five domains that may lower the rating, along with criteria for potential upgrading primarily applicable to non-randomized studies.[16] Randomized controlled trials (RCTs) are initially rated as high certainty, reflecting their design's strength in minimizing systematic error, while observational studies begin at low certainty due to inherent risks of confounding and bias.[3] Certainty is rated on a four-level scale: high (further research unlikely to change confidence in effect estimates), moderate (further research may change confidence), low (further research likely to change confidence), or very low (very uncertain effect estimates).[17] Downgrading occurs when evidence demonstrates limitations in one or more of the following domains, each assessed as none, serious (downgrade by 1 level), or very serious (downgrade by 2 levels), with cumulative effects possible but rarely exceeding 3 levels total:- Risk of bias: Evaluated using tools like the Cochrane Risk of Bias instrument for RCTs or ROBINS-I for non-randomized studies, focusing on flaws in design, conduct, or analysis that could overestimate or underestimate effects, such as inadequate randomization, blinding failures, or attrition.[16][18]
- Inconsistency: Assessed by unexplained heterogeneity in effect estimates across studies, indicated by statistical measures like I² >50% or visual inspection of forest plots, suggesting variability in true effects or methodological differences.[19]
- Indirectness: Determined by mismatches between the evidence's population, intervention, comparator, or outcomes (PICO elements) and those of interest, such as extrapolating from surrogate endpoints or different patient groups.[20]
- Imprecision: Identified when confidence intervals cross thresholds for meaningful effects (e.g., minimal important difference) or include both benefit and harm, often using optimal information size criteria to gauge sample adequacy.[16]
- Publication bias: Suspected through funnel plot asymmetry, Egger's test, or comprehensive searches revealing selective reporting, particularly when smaller studies show larger effects.00122-1/fulltext)