Why should clinicians care about Bayesian methods?
Introduction
Medical researchers are heavy users of inferential methods. Leading medical journals are replete with clinical and epidemiological studies whose conclusions are backed by quantitative inferential concepts such as p-values and confidence intervals. Small wonder, then, that there is growing interest in Bayesian inference, which is said to have a number of key advantages over conventional methods (see, e.g. Lilford and Braunholtz, 1996; Bland and Altman, 1998; Spiegelhalter et al., 1999).
Yet despite the increasing awareness of their existence, the actual application of Bayesian methods remains very much the exception rather than the rule among working clinicians. There are many potential explanations for this, from concern about the much-discussed use of prior belief in Bayesian analysis, to the simple reluctance of old dogs to learn new (statistical) tricks.
I would argue, however, that the fundamental reason why Bayesian methods have yet to become widely adopted is that they have yet to pass the “cost–benefit analysis” any hard-pressed professional performs when hearing of some new technique: will the cost of acquiring the necessary expertise be compensated for by real practical benefit? Many potential users of Bayesian methods, it would seem, have concluded that the effort of switching from the familiar old frequentist methods to Bayesian techniques is unlikely to give sufficient return on (intellectual) investment.
Of the two factors involved in such a cost–benefit analysis, the cost of adopting Bayesian techniques can certainly seem high to non-specialists: the typical introductory text on Bayesian inference has a mathematical content well above that of the equivalent texts for conventional statistics. Yet I would argue that this is not the principal reason why Bayesian techniques have yet to become widely adopted: their mathematical basis is within the reach of anyone — given the motivation. Rather, I suspect the chief reason is the failure of Bayesians to provide a ready answer to a rather obvious question: if the standard methods of inference are so awful, how come the whole scientific enterprise has not collapsed around our ears?
In what follows, I suggest that while the failings of conventional inference are real enough, they typically have either little practical impact on the assessment of evidence, or are easily explained away as the result of other statistical deficiencies. As such, avoiding them hardly provides much motivation for switching to Bayesian methods.
If the “stick” of supposedly dire consequences of using frequentist methods has not succeeded in promoting the widespread use of Bayesian methods, what might? In this paper, I suggest a possible “carrot”, which focuses instead on one of the major practical advantages of Bayesian methods: their ability to take explicit, quantitative account of extant knowledge. Specifically, I outline a Bayesian method of assessing quantitatively the credibility of a new research finding. The technique is easy to apply, and produces a measure of credibility whose interpretation is both straightforward and transparent. As such, it may provide a gentle “entrée” to Bayesian modes of thought, which allows the non-specialist to appreciate their power without losing all contact with familiar frequentist methods.
I begin, however, by reviewing briefly the concerns raised over the years about frequentist inference, and why these concerns have had so little impact among working clinicians.
Section snippets
Five reasons to fret about frequentist methods
Frequentist concepts such as p-values have faced criticism almost since the advent of “significance testing” in the 1920s. The criticisms have ranged from qualms about the interpretation and arbitrariness of the well-known p=0.05 criterion for significance, through concern that p-values exaggerate real “significance”, to claims that frequentist methods conceal the ineluctable presence of subjectivity in inference. Indeed, so sustained has this criticism been that one wonders how frequentist
Confidence intervals: better than p-values?
So far, I have focused primarily on the deficiencies of p-values, used to evaluate the significance of point-null hypotheses. However, criticism of p-values is not a uniquely Bayesian sport: frequentists have themselves argued against the use of p-values, on the grounds that they fail to convey information about effect size or sample size. It is entirely possible for researchers to claim a “statistically significant” effect and yet for the effect to be so small as to have no significance in the
What is required to change old habits?
Thus far I have outlined the principal reasons put forward by advocates of Bayesian methods for worrying about conventional statistical inference. I have also suggested why these failings have had so little impact outside the statistical community: while they are real and of conceptual importance, they have not led to inferential meltdown in routine research.
If Cassandra-like warnings of the dire consequences of using frequentist methods have not convinced working clinicians of the merits of
Credibility: a suitable case for Bayesian treatment
While the outcomes of medical research are now routinely stated quantitatively via such measures as odds ratios, this is rarely the case for their inherent credibility — that is, the level to which the finding is both plausible in the light of current knowledge, and backed by persuasive weight of evidence. Even where a finding is controversial or unexpected, assessments of its credibility almost always consist of broad-brush qualitative arguments based on previous research or experience. Such
Derivation of the critical prior interval (CPI)
To fix ideas, I shall henceforth focus on log-normally distributed ORs as the means of summarising both the new data and the assessment of credibility. In deriving the Critical Prior Interval, I shall also assume this interval to be symmetrical in log-odds about a mean value of zero. The credibility assessment is thus being made on the basis of “cautious reasonable scepticism” (Kass and Greenhouse, 1989), consistent with the judiciously sceptical stance conventionally adopted in scientific
Credibility assessment: a worked example
As an example of credibility assessment in action, consider a recent case-control study of the impact of simple lifestyle modifications on cardiovascular risk (Spencer et al., 1999). This found that among men aged 27–64, taking non-vigorous exercise and avoidance of added salt were both associated with statistically significant reduced risks of acute myocardial infarction (AMI). For non-vigorous exercise, an OR of 0.5 with a standard 95% CI of (0.4,0.7) was found, while avoidance of added salt
Conclusion
The relatively minor inroads that Bayesian analysis has made into mainstream medical journals is pretty disheartening for those who (like the author) believe it has much to offer working clinicians. I have argued here that the blame lies principally with the strategy adopted by many Bayesians (including, again, myself) for convincing others to take up Bayesian methods: Cassandra-like warnings of the terrible fate that awaits those who persist in their frequentist habits. In this paper, I have
Acknowledgements
It is a pleasure to thank David Spiegelhalter, Dennis Lindley and Mark Selinger for their constructive comments on an early draft, and Iain Chalmers, Richard Lilford and Catherine Elsworth for valuable discussions.
References (23)
- et al.
Testing precise hypotheses
Statist. Sci.
(1987) - et al.
Testing a point null hypothesis: the irreconcilability of P-values and evidence
J. Amer. Statist. Assoc.
(1987) - et al.
A meta-analysis of physical activity in the prevention of coronary heart disease
Amer. J. Epidemiol.
(1990) - et al.
Bayesians and frequentists
British Med. J.
(1998) - et al.
Interpretation and Uses of Medical Statistics.
(1985) - et al.
Bayesian statistical inference for psychological research
Psychol. Rev.
(1963) - et al.
Meta-analysis: spurious precision? Meta-analysis of observational studies
British Med. J.
(1998) Probability and the Weighing of Evidence.
(1950)- et al.
Dietary sodium intake and subsequent risk of cardiovascular disease in overweight adults
J. Amer. Med. Assoc.
(1999) - et al.
Starting Statistics in Psychology and Education.
(1993)
Comprehensive review of biorhythm theory
Psychol. Rep.
Cited by (54)
Coronary Artery Bypass Surgery Without Saphenous Vein Grafting: JACC Review Topic of the Week
2022, Journal of the American College of CardiologyCitation Excerpt :The overall HRs and standard errors were provided to the central analysis team, whereby they were pooled with results from the literature review to produce a combined treatment effect estimate. We also conducted a Bayesian23-25 normal-normal hierarchal model meta-analysis to further explore the robustness of the results, as well as to calculate the probability of treatment effect lying on a particular range of values, such as HR <1.00. Results are presented using the median μ and credible intervals.
Brain Oxygen–Directed Management of Aneurysmal Subarachnoid Hemorrhage. Temporal Patterns of Cerebral Ischemia During Acute Brain Attack, Early Brain Injury, and Territorial Sonographic Vasospasm
2022, World NeurosurgeryCitation Excerpt :The demographic, medical, and clinical information, admission Glasgow Coma Scale (GCS) score, Hunt and Hess grade,50 World Federation of Neurological Surgeons (WFNS) score,51 Fisher grade,52 and Glasgow Outcome Scale score at discharge and at 6 months53 were entered in an administrative database. The predictors of outcome were examined using Bayesian credibility analysis of the odds ratio (OR) calculations.54 The critical care parameters (recorded hourly for each patient) were extracted from the ICU electronic medical records retrospectively and the daily median value for heart rate, Fio2, % saturation of oxygen, Pao2, Paco2, hemoglobin, SBP, MAP, CPP, ICP, and Pbto2 were calculated.
Predicting Duration of Outpatient Physical Therapy Episodes for Individuals with Spinal Cord Injury Based on Locomotor Training Strategy
2022, Archives of Physical Medicine and RehabilitationCitation Excerpt :Bayesian inferences follows this process using 3 steps: (1) identify your previously held opinion on what you're interested in learning more about through your research question (ie, specify the prior probability of the parameter of interest, referred to as a “prior”; (2) collect and summarize the observed outcome using a likelihood function; (3) produce a posterior distribution representing your updated position about the unknown parameter, referred to as the “posterior.”51 The posterior distribution allows investigators to make statements (inferences) regarding certainty of the parameter of interest (eg, a mean estimate or proportion) for the population of interest (vs the sample distribution using more conventional statistical approaches).52-56 The uncertainty about the parameter is represented by the Bayesian credible interval (CI).
Most ankle sprain research is either false or clinically unimportant: A 30-year audit of randomized controlled trials
2021, Journal of Sport and Health ScienceCitation Excerpt :Alternatives to FPR have been discussed by Colquhoun.23 Perhaps the most clinically intuitive option is the use of a reverse Bayesian approach,54 where the observed p value is used to calculate the prior probability required to achieve a specific or minimal FPR (e.g., 5%). This then allows the researcher to determine whether the calculated prior probability is plausible or not.30
Issues with criteria to create blacklists: An epidemiological approach
2020, Journal of Academic LibrarianshipInvestigating the relationship between sleep and migraine in a global sample: a Bayesian cross-sectional approach
2023, Journal of Headache and Pain