Rating the quality of evidence and the strength of recommendations:the new GRADE system in venous disease

Grégoire LE GAL,MD, PhD
Zarrin ALAVI,MSc
Université Européenne de Bretagne; Université de Brest
INSERM CIC 05-02 IFR148 – CHU de la Cavale Blanche
Département de médecine interne et de pneumologie

Rating the quality of evidence and the strength of recommendations: the new GRADE system in venous disease

by G. Le Gal and Z. Alavi,France

The Grades of Recommendation Assessment, Development and Evaluation (GRADE) system was developed in 2004 as an attempt to provide systematic and explicitmethods of building guidelines for clinicians. The systemwas adopted by the American College of Chest Physicians (ACCP) in the latest edition of the ACCP Evidence-Based Clinical Practice Guidelines on Antithrombotic and Thrombolytic Therapy. The ACCP grades its recommendations both in terms of the strength of recommendation (1 = strong; 2 = weak) and of the quality of evidence (A = high; B = intermediate; and C = low). Although the numbers and letters used in the grading system remain unchanged compared with previous editions, there have been significant changes in the underlying definitions and criteria leading to these grading recommendations over the latest few editions of these guidelines. In particular, the methodological quality of available studies is no longer the only determinant of the quality of evidence, while the strength of a recommendation is no longer only based on the quality of evidence, but also on the balance between benefit and harm, on values and preferences, and on cost. Guideline users need to be aware of the way grades of recommendations are obtained in order to fully understand and take advantage of guidelines for their patients’ care.

Medicographia. 2011;33:280-284 (see French abstract on page 284)

Treatment decisions involve finding the balance between benefits on the one hand and risks, burdens, and inherent costs on the other. In order for clinicians to integrate guideline recommendations with their own clinical judgment and fully exploit them in daily clinical practice, they need to understand the foundation for these recommendations and to know how much confidence they can place in them.

Many guidelines are published by medical societies, public health agencies, or journals around the world. Unfortunately, they often use different ways of rating the quality of evidence and of grading the strength of recommendations. As a result, clinicians, patients, managers of health-care systems, and policy makers face challenges in understanding the messages that grading systems are trying to convey when they need to compare alternative strategies and diagnostic tests and weigh up their benefits and downsides. A lot of effort has been spent coming up with the much anticipated criteria and approaches for an optimal worldwide grading system, reflecting greater awareness of the variability in patients’ values and preferences. In addition to minimizing bias and aiding interpretation, following a systematic approach to grad- ing the strength of recommendations enhances the usefulness of clinical guidelines. The Grades of Recommendation Assessment, Development and Evaluation (GRADE) system was developed in 2004 as an attempt to provide systematic and explicit methods of making judgements.1

The American College of Chest Physicians (ACCP) Evidence- Based Practice Guidelines on Antithrombotic and Thrombolytic Therapy is “bedtime reading work” for physicians involved in the management of patients with venous disease. In its latest edition released in 2008, the ACCP committee of methodologists and guideline developers adopted a grading system based on the GRADE approach. The criteria, displayed in Table I, have been placed in an order that approximates their relative significance.2 The ACCP team in charge of the task agreed on these criteria for defining a grading system that would be consistent with the latest developments in the field.

In this paper, we will focus on the GRADE approach to recommendations and on how the GRADE system categorizes the quality of evidence and strength of recommendations, and explore the implications of these grading categories for patients, clinicians, and policy makers.

What makes a good grading system?

For an optimal grading system, decisions regarding quality of evidence should be separate from those regarding strength of recommendations. Not all grading systems succeed in doing this. For instance, early systems of grading methodological quality relied primarily on the basic study design (ie, randomized control trials [RCTs] or observational studies). Study design was used by these early grading systems as an essential component for determining our level of confidence in estimates of beneficial and adverse treatment effects.

Over the past few years, there has been increased awareness of a number of other factors that require consideration in order for us to be confident in the estimation of benefits, risks, burden, and costs.

What differentiates GRADE from previous grading systems?

Compared with previous/other grading systems, the GRADE working group wanted a system that used explicit definitions of strength of recommendation and of quality of evidence. Their system takes into account various factors that can affect the quality of evidence, not only the study design and quality, but also study limitations, imprecision, and possible confounding. It assesses the relative importance of outcomes, clarifies the judgement on benefit and harm by providing an explicit definition for trade-offs between benefit and harm, and includes judgement on whether the incremental health benefits are worth the costs. Finally, it provides a clear interpretation of the recommendation.

Table I
Table I. Criteria for an optimal grading system, according to the
ACCP Task Force.

Abbreviations: ACCP, American College of Chest Physicians.
Modified from reference 2: Guyatt et al. Chest. 2006;129:174-181. © 2006,
American College of Chest Physicians.

Quality of evidence in the GRADE system

“Quality of evidence” reflects the extent to which the confidence in an estimate of an effect is adequate in supporting a recommendation. To achieve transparency and simplicity, the GRADE system classifies the quality of evidence at one of four levels: high, moderate, low, and very low.

As with early systems for grading the quality of evidence, GRADE initially focuses on study design. In this way, RCTs without limitations constitute high-quality evidence, observational studies without special strengths or with important limitations constitute low-quality evidence, while any other study (case series) constitutes very low–quality evidence.

_ Negative factors affecting quality of evidence
There are, however, negative factors that affect the quality of evidence that can downgrade the quality of observational studies as well as RCTs:

a) Study limitations
If studies have major limitations that may bias their estimates of the treatment effect, confidence in the evidence decreases. Such limitations include a lack of allocation concealment, a lack of blinding, a significant number of patients lost to follow- up, failure in the intention-to-treat analysis, failure to report outcomes, and early ending of a study due to benefit.

b) Inconsistency of results
Heterogeneity or variability in results across studies suggests true differences in underlying treatment effect. This variability may come from differences in populations, interventions (larger effects with higher drug doses), or outcomes (decreasing treatment effect with time). The quality of evidence diminishes when there is heterogeneity of results, but investigators fail to identify a credible explanation.

c) Indirectness of evidence
Two types of indirectness of evidence addressed by the guideline developers are:
– When considering the use of one of two active drugs. In the absence of a randomized comparison of the drugs, randomized trials may compare one drug with placebo and the other with placebo. This leads to a comparison of the magnitude of effect of both drugs, therefore, the evidence is of a lower quality than it would have been had there been a direct head-to-head comparison of the drugs.
– When there are discrepancies between the population, intervention, intervention comparator, or outcome of interest and those included in the applicable studies.

d) Imprecision
The quality of evidence is reduced in cases where studies use relatively few patients or have few events, leading to wide confidence intervals.

e) Publication bias
Not reporting studies, especially those that show no effect, downgrades the quality of evidence. A prototypical situation would be when published evidence is limited to a small number of trials, all of which are financed by industry.

_ Positive factors affecting quality of evidence
Conversely, there are also some factors that might increase quality of evidence.

a) Even though observational studies usually result in a low quality of evidence, strong observational studies can methodologically provide large or very large and consistent estimates of the magnitude of a treatment effect. This gives good confidence in the results, in particular when there is no major plausible confounder. The larger the magnitude of effect, the stronger the evidence becomes.

b) If all the plausible confounders tend to reduce the estimation of the effect, the confidence in the evidence increases.

c) Finally, the existence of a dose-response gradient also increases confidence in the authenticity of the effect.

The GRADE system has four levels of quality of evidence: A = high; B = moderate; C = low; and D = very low. A “high quality of evidence” means that further research is unlikely to change our confidence in the estimate of effect. A “moderate quality of evidence” means that further higher-quality research may have an impact on our confidence in the estimate of effect or to change this estimate. A “low quality of evidence” is used when further higher-quality research is likely to have an important impact on our confidence in the estimate of effect, or to change the estimate. Finally, the evidence is graded “very low” when any estimate of effect is highly uncertain.

Strength of a recommendation in the GRADE system

The “strength of recommendation” reflects the extent to which we can be confident that the desirable effects of adhering to an intervention outweigh its undesirable effects. There are two grades of recommendations: strong (1) and weak (2). A strong recommendation means that benefits clearly outweigh risks, while a weak recommendation means that one can’t be sure that benefits outweigh risks.

The strength of a recommendation is no longer exclusively based on the quality of evidence. It is also determined by2:

a) The balance between desirable and undesirable effects
This takes into account the incidence rate of the target event, the importance of the event that treatment prevents, the magnitude of treatment effect, the precision of estimates of treatment effect, and the risks associated with therapy.

b) Burdens of therapy

c) Costs
A judgement may be made on whether the net benefits are worth the incremental cost.

d) Patients’ varying values and preferences
Strong and weak recommendations may be interpreted as follows. If the recommendation is strong, benefits clearly outweigh risks, or vice versa, and apply to most patients in most circumstances. The use of a decision aid tool is not needed, and the patient only needs to be informed. In the case of a weak recommendation, the best action may differ and other alternatives may be equally reasonable. In this case, decision aid tools may be useful, and the physician needs to make sure that the choice is in accordance with the patient’s values. While almost all patients would make the same choice for strong recommendations, the choice may significantly vary for a weak recommendation.

Rating evidence and recommendations in venous disease

The GRADE system has been implemented in the 8th edition of the ACCP Evidence-Based Clinical Practice Guidelines on Antithrombotic and Thrombolytic Therapy. There are two levels of strength of recommendation (1 = strong, “We recommend”; and 2 = weak, “We suggest”), and three levels of quality of evidence (A = high; B = moderate; and C = low).

Table II
Table II. ACCP grades for recommendations.

Abbreviations: ACCP, American College of Chest Physicians; RCT, randomized controlled trial.
Modified from reference 3: Guyatt et al. Chest. 2008;133:123S-131S. © 2008, American College of Chest Physicians.

Therefore, six different grades may be used to grade a recommendation (Table II).3 The reader needs to understand the important changes made in the way the final recommendations are obtained. The most dramatic change is that the strength of recommendation is no longer based, as was the case only a few years ago, solely on the type and quality of available studies. Back in 1989,4 panelists would first rate the level of evidence from “large trials with clear-cut results and low risk of error” to “case series only,” and the grade of recommendation depended on the level of evidence, with no other parameter taken into account. Interestingly, until the 6th edition in 2001, the quality of evidence rating preceded the strength of recommendation rating in the grading system (from A1 to C2), and the assessment of quality of evidence was mainly based on study design, the highest level being limited to RCTs and meta-analyses of RCTs.

In 2001, for first time,5 the primacy of the judgement on the clarity of the risk-benefit trade-off of an intervention over the methodological quality of a study alone became clear. The grade of recommendation (1 or 2) was therefore placed before the quality of evidence (A, B, or C). Moreover, high-quality studies could lead to weak recommendations because of uncertainty over precise estimates of benefit, harm, or costs and small effect sizes. Conversely, in 2004, it became possible to make a grade 1 recommendation even in the absence of RCTs with no important limitations. If experts judged that an extrapolation made from available RCTs was secure or that data from observational studies were overwhelmingly compelling, the quality of evidence was marked “C+”, which could lead to a grade 1 recommendation.6 The 2004 edition was the first to be named, “Evidence-Based Practice Guidelines,” and the four steps of evidence-based medicine were followed for each recommendation: clear identification of the clinical problem; literature retrieval; literature appraisal; and application of the findings acknowledging factors other than evidence.

In 2008, quality of evidence became “only one” of the determinants of the strength of recommendation, along with beneficial health outcomes, decreased burden of treatment, variability in patients’ preferences, and resource use. The recommendation is a true judgement on the overall value of the balance between the benefits and risks incurred by following this recommendation, a judgement based not only on the expected benefits in terms of health, treatment-related risks, and patient values and preferences, but also on economic considerations and the allocation of resources.

Limitations and misunderstandings

The GRADE system certainly represents a major improvement in clinical guideline methodology. It provides the clinician with recommendations based not only on the methodological quality of available studies, but also on other important criteria (see above). However, one could consider that recommendations based on the GRADE system are more demanding for the reader. In fact, it is crucial for guideline users to carefully read and understand the way recommendations are made. Above all, to fully appraise a recommendation, they need to read not only the final summary sentence, but the whole text giving the explicit criteria leading to the recommendation.

For example, the latest edition of the ACCP guidelines is often quoted as strongly recommending long-term treatment in patients who experience a first unprovoked deep vein throm- bosis or pulmonary embolism. However, the exact recommendation reads: “For patients with a first unprovoked VTE [venous thromboembolism], and in whom risk factors for bleeding are absent and for whom good anticoagulant monitoring is achievable, we recommend long-term treatment (Grade 1A).” In terms of values and preferences, this recommendation attaches a relatively high value to the prevention of recurrent VTE and a lower value to the burden of long-term anticoagulant therapy. This is obviously very different to the quick summary and reveals the thinking behind how decisions are made.7

Moreover, GRADE authors insist that recommendations apply to specific settings, groups of patients, and economic contexts. There may be significant variations across countries or hospitals that may influence the decision of whether to adhere to a recommendation. Costs, for example, as well as the way costs influence clinical decisions, differ widely between countries. Most of all, no recommendation can take into account all individual clinical circumstances. The ACCP guideline authors warn that any grade other than a grade 1A recommendation indicates that the authors acknowledge that other interpretations of evidence and other clinical policies may be appropriate. Furthermore, they suggest that even grade 1A recommendations may not apply to all patients and circumstances, either because of resource constraints or because of patients’ atypical values and preferences. Finally, physicians must use their judgement and consider local and individual circumstances along with their patients’ values and preferences to achieve the best-tailored decisions.


Clinical decision-making is not simple. Guidelines help clinicians and patients facing complex choices to choose informed options, to improve quality of care, and to make the best use of limited resources. The GRADE system provides a standardized and explicit way of compiling recommendations, of which physicians must be aware in order to fully make the most of guidelines in the care of their patients.


Acknowledgements: the author would like to thank Mrs Alavi for her useful assistance.

1. Atkins D, Best D, Briss PA, et al; GRADE Working Group. Grading quality of evidence and strength of recommendations. BMJ. 2004;328:1490.
2. Guyatt G, Gutterman D, Baumann MH, et al. Grading strength of recommendations and quality of evidence in clinical guidelines: report from an American College of Chest Physicians Task Force. Chest. 2006;129:174-181.
3. Guyatt GH, Cook DJ, Jaeschke R, Pauker SG, Schunemann HJ. Grades of recommendation for antithrombotic agents: American College of Chest Physicians Evidence-Based Clinical Practice Guidelines (8th Edition). Chest. 2008; 133:123S-131S.
4. Sackett DL. Rules of evidence and clinical recommendations on the use of antithrombotic agents. Chest. 1989;95:2S-4S.
5. Guyatt G, Schunemann H, Cook D, Jaeschke R, Pauker S, Bucher H. Grades of recommendation for antithrombotic agents. Chest. 2001;119:3S-7S.
6. Guyatt G, Schunemann HJ, Cook D, Jaeschke R, Pauker S. Applying the grades of recommendation for antithrombotic and thrombolytic therapy: the Seventh ACCP Conference on Antithrombotic and Thrombolytic Therapy. Chest. 2004;126:179S-187S.
7. Kearon C, Kahn SR, Agnelli G, Goldhaber SZ, Raskob GE, Comerota AJ. Antithrombotic therapy for venous thromboembolic disease: American College of Chest Physicians Evidence-Based Clinical Practice Guidelines (8th Edition). Chest. 2008;133:454S-545S.

Keywords: evidence-based medicine; review; recommendations