Datamethods Discussion Forum
discourse.datamethods.org.web.brid.gy
Datamethods Discussion Forum
@discourse.datamethods.org.web.brid.gy
This is a place for discussions and Q&A; about data-related issues and quantitative methods including study design, data analysis, and […] [bridged from https://discourse.datamethods.org/ on the web: https://fed.brid.gy/web/discourse.datamethods.org ]
Statin-related side effects: the recent Lancet publication is biased toward false-negatives
Dear Frank, I am not familiar with the history of the statin literature, so I will refrain from commenting on that specifically. That said, I consider it highly implausible that a single paper could be regarded as conclusive, starting with the well-known problems of transportability. The effort I am asking for is simply this: do not canonize the literature, because 1) even under the best of circumstances, it is extraordinarily difficult to guarantee its robustness - the history of medicine makes this clear with cases such as the Women’s Health Initiative - and 2) the current landscape is driven by high-level interests that range - as I argued in my response to Erin - from the pursuit of funding to the endorsement of political and ideological positions. We are developing increasingly sophisticated methods that rest on extremely strong and difficult-to-handle assumptions (e.g., g-methods), in a context where the scientific community has proven capable of adopting and institutionalizing rituals - such as p < 0.05 - that had far less justification (virtually none, I would argue). More than 100 years after the earliest warnings about the distinction between “statistical significance” and “practical significance,” we still find prestigious journals that fail to differentiate between the two concepts. For these reasons, I continue to regard my strong concerns about the entire process that generates data and results as well founded, and I believe it is a deontological duty of methodologists - especially those far more skilled than I am - to maintain the highest level of vigilance.
discourse.datamethods.org
February 11, 2026 at 10:38 AM
Statin-related side effects: the recent Lancet publication is biased toward false-negatives
Dear Erin, First of all, I want to make it clear that I, too, am convinced of your genuine intentions in writing your comment. That said, I disagree with almost everything you wrote. To begin with, if the prior assumption is that plausible AEs are virtually absent based on the existing literature, then there is no need to resort to a series of statistical rituals grounded in nullism, dichotomization, reification, and the significance fallacy to claim as much. If there is an initiative aimed at evaluating adverse effects, then this should be conducted with the highest level of methodological rigor. I find it incorrect - and dangerous - to criticize papers that point out inconsistencies based on the very strong assumption “any method is acceptable since the results will ultimately be interpreted in light of the belief that AEs are rare.” Second, I cannot endorse your general narrative for three main reasons: i) the accounts of clinicians and methodologists I know - some of whom have worked directly for regulatory agencies; ii) what has already occurred in the history of medicine (including striking examples, like the Vioxx case); and iii) the evidence that has emerged in recent years regarding phenomena such as regulatory capture and the large influence of large pharmaceutical companies on decision-making bodies (e.g., 1], 2], 3], [[4]). The current system of evidence generation is _not_ neutral and tends to overstate the strength of the evidence produced according to very clear incentives - ranging from [securing funding to the need to take [political positions. To give just one good example among many, consider the case of the WHO, which at the beginning of the pandemic stated, “it is very clear right now that we have no sustained human-to-human transmission” with reference to SARS-CoV-2, and published outreach posts containing statements such as “FACT: #COVID19 is NOT airborne.” There would be further aspects to address (e.g., the establishment of cost-benefit assessments based not on surrogate endpoints but on patient-centered and multivariate endpoints). Nonetheless, I think what I have outlined above is more than sufficient to argue that, for the good faith I believe guides your comment to be fully justified methodologically, a far greater effort is required than simply adhering to priors that are convenient for the most powerful and robust stakeholders. Openly calling out what one considers to be methodological flaws in research - **regardless of the topic under examination** - is, and must remain, one of the indispensable ethical and epistemic principles of scientific conduct, in order to safeguard both public health and the production of knowledge.
discourse.datamethods.org
February 12, 2026 at 5:27 AM
Statin-related side effects: the recent Lancet publication is biased toward false-negatives
As a clinician who worked in pharmacovigilance for many years, I’m very familiar with the methods used to assess drug-related safety signals. My work entailed causality assessment for potential drug-related adverse events and assessment of observational and experimental safety evidence used to develop and revise drug product monographs (PMs). After reading the Lancet article and your rebuttal to it, I’m- how shall I say this- “ _concerned by your concern_ ” I doubt that you’d feel a need to publish your letter if you had witnessed, first-hand, the pharmacovigilance work that’s done every day by international drug regulatory agencies. Enormous effort/manpower is dedicated toward detecting and investigating potential adverse drug reactions. These assessments are performed both at the time of initial marketing and then regularly in the postmarket setting. Drug sponsors are legally obliged to monitor and assess AE reports submitted to regulatory agencies and to their company and also to monitor the medical literature for any potentially new safety signals related to their products. Regulators _duplicate_ much of this effort by independently surveilling the medical literature and searching for, then assessing relevant AE case reports. They open formal assessments into safety issues flagged in-house, by other international regulators, and by drug sponsors. The amount of attention devoted to even the _most obscure_ adverse events by regulatory agencies was, in my view as a physician, somewhat obscene, given: 1. the very poor quality of most AE case reports; 2. the lack of documented dechallenge/rechallenge for most reports (historical features that are usually needed permit definitive causality assessment for individual cases); and 3. the differential impact of PM AE lists on prescribers (virtually no impact) and patients (a huge psychological impact for certain subsets). Many times, during my tenure in this field, I felt that we ran a real risk of harming, rather than protecting patients - if we weren’t very careful. Given my own experience working in pharmacovigilance, I agree completely with the authors of the Lancet article. The precautionary principle has been such a powerful guiding force in the field of pharmacovigilance for so many years and statins have been on the market for so long, that current statin PMs have likely become grossly polluted with AEs that stemmed from “half-baked” drug safety investigations conducted many years ago using suboptimal lines of evidence. The RCT experience with statins is now SO huge, after all these years, that it seems very reasonable, to me, to revisit very long lists of uncommon AEs, now that more and more of these rare AEs have been accrued in the collective RCT record. As a physician, the reason I feel so strongly about this topic is that I see the harmful psychological effects that patient information leaflets can have on patients’ health-related decisions. Many pharmacists provide these leaflets when they dispense a new drug to a patient. The leaflets often list a dizzying array of uncommon AEs. Some patients become overwhelmed by these lists and, unable to contextualize them appropriately, decide not to take the medication that they’ve just been prescribed. As a result, their symptoms remain unaddressed or they might forgo a medication that could prevent important disease. Another, even more common phenomenon, is the patient who develops a physical symptom, looks up the PMs of all his prescribed medications, sees the AE listed in one of the PMs, assumes that his drug must be responsible for his symptoms, and decides, unilaterally, to stop his treatment, without first consulting his physician. If he suddenly develops embarrassing flatulence, he might be more apt to blame a medication he’s been taking for years, without incident, than his sudden new fondness for lentils- simply because he saw flatulence” on the PM AE list. People of all educational levels and backgrounds can be very susceptible to the _post hoc ergo propter hoc_ fallacy. Indeed, the powerful human tendency to assume consequence from precedence is often impossible to dispel, even through patient explanation. For these reasons, it seems not only plausible, but very likely, that lives could be saved by removing, from statin PMs, mention of AEs that lack solid _current_ -day evidence for causality. I respect your effort to encourage appropriate use of statistics when assessing drug safety. And I believe your motivations are sincere, since your letter acknowledges the clear cardiovascular benefits of statins. But I fear that your motivation will still be questioned by some since your letter implicitly suggests that the authors of the Lancet study are using opaque statistics to obscure the “true” risks of statins. To this end, I fear that your letter could further fuel the dangerous conspiracy theories that have swirled for decades around this class of drugs.
discourse.datamethods.org
February 11, 2026 at 10:38 AM
Statin-related side effects: the recent Lancet publication is biased toward false-negatives
Dear DataMethods Users, A recent Lancet meta-analysis assessed whether adverse events listed in statin product labels are causally attributable to statin therapy, using double-blind randomized trials and false discovery rate control across multiple outcomes. The authors concluded that the evidence does not support causal associations for most labeled adverse effects, a conclusion that was rapidly echoed in headlines such as _“Statins do not cause the majority of side effects listed in package leaflets”_. However, in our new letter we argue that this conclusion arises from the combination of multiple, compounding sources of bias that systematically favor false-negative findings in safety assessment. These include stringent multiplicity control, threshold-based overinterpretation of statistically “non-significant” results, dichotomous readings of interval estimates, and reliance on randomized trial settings with limited real-world transportability together with intention-to-treat dilution under non-adherence. Because “non-significant” does not mean “no causal effect”, we call for an interpretation framework centered on effect sizes and overall uncertainty, along with sensitivity analyses across alternative error trade-offs, risk strata, and dosages. Importantly, the consequences of inferential errors are not symmetric: false-positive signals primarily impose costs on researchers and regulators through additional scrutiny, whereas false-negative conclusions disproportionately affect the most vulnerable stakeholders, namely patients. We look forward to your thoughts and welcome further discussion on this issue. Rovetta et al., 2026, letter on ‘Assessment of adverse effects attributed to statin therapy in product labels’, v1.0.pdf (144.5 KB)
discourse.datamethods.org
February 11, 2026 at 2:41 AM
Dichotomization
> _“I don’t think you can conclude that…The irony of responder analysis is that it fails at its original goal.”_ Yes - I’ve read Dr.Senn’s articles and know he’s been screaming into the void about all this for many years: The article linked in post #75 above seems, conceptually, horribly muddled to me- yet I fear that it’s impact might have been substantial… Senn’s response and related publications: https://link.springer.com/article/10.1177/009286150303700103#citeas https://pmc.ncbi.nlm.nih.gov/articles/PMC524113/ https://errorstatistics.com/wp-content/uploads/2016/07/senn-2003-pharmaceutical_statistics.pdf If you were to re-run the hypothetical trial described in post #75, you might obtain the same between-arm difference of 6 points. But this time, drilling down to see what happened to each individual patient’s score, you might see a completely _different_ distribution of point score changes over the course of the trial. If you were to conclude, as a believer in “responder analysis,” based on analysis of your _first_ trial, that “10% of patients exposed to this drug will respond exceptionally well,” how will you react when you _repeat_ the trial and obtain the same between-arm difference of 6 points, but _this_ time observe that nearly _all_ patients’ scores changed by the _same_ number of points over the course of the trial? If you had run the second trial _before_ the first trial, you would NOT have concluded that the “worked exceptionally well” in 10% of patients, but rather that all patients “respond similarly.” This simple example illustrates the folly of the responder analysis approach and the importance of acknowledging the stochasticity in patients’ ostensible “response” to treatment, from one treatment episode to another. Most importantly, **the fact that a patient’s score changed over the course of the trial does NOT allow us to infer that the treatment he received _caused_ that change, EVEN IF the treatment is one with established _group_ -level efficacy.** It’s valid to infer that the new drug “caused” the _between-group/arm_ difference of 6 points (i.e., that the new drug caused one _group’s_ score to wind up 6 points different from the other _group’s_ score; we can say that the drug has meaningful intrinsic efficacy). But it’s NOT valid to “translate” that established _group_ -level inference of efficacy to the level of _individual patients_ enrolled in the trial (for the purpose of labelling them as “responders” or “non-responders”). For diseases with waxing/waning natural histories, _replication_ (otherwise known as “crossover” or “positive dechallenge and/or rechallenge”) at the level of the _individual_ is needed to establish causality at the level of _individual_ patients. And since dechallenge/rechallenge/crossover is NOT a feature of most parallel group RCT designs, most trials do NOT allow us to make inferences of causality at the level of _individual_ patients. **Unless this erroneous, highly pernicious, and deeply entrenched conflation of group-level and individual-level causality is acknowledged and loudly criticized by statisticians, “responder analysis” will persist- and so will the practice that serves it: dichotomization of continuous endpoints.**
discourse.datamethods.org
February 8, 2026 at 6:24 AM
Dichotomization
<p>Congratulations on the article. The app you developed to show the effects of dichotomization on sample size should help many researchers.</p> <p>Arguably, the only way to eradicate entrenched bad practices is to trace them back to their origins and then pull them out by the root. The “Context” section of your paper describes how and why responder analysis became a mainstream practice. I looked up some of the papers you cited (e.g., Kieser). It seems like the people who first proposed responder analysis were trying to satisfy regulators’ demands for an analysis that could help them to gauge the <em>clinical relevance</em> of the effects shown by new drugs in RCTs.</p> <p>Kieser notes (boldface is mine):</p> <blockquote> <p><em>“A number of regulatory guidelines propose that clinical relevance should be assessed by considering the <strong>rate of responders</strong>, that is, the proportion of patients who are observed to achieve an apparently meaningful benefit…”</em></p> </blockquote> <p>This paper, found through my own research on the history of responder analysis, represented early advocacy for the technique (which was closely linked to promotion of the of the concept of “number needed to treat”):</p> <p><a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC1112685/#B13" rel="noopener nofollow ugc">https://pmc.ncbi.nlm.nih.gov/articles/PMC1112685/#B13</a></p> <p>Discussion of a fictional scenario might serve to highlight a key issue.</p> <p>Hypothetical scenario:</p> <p>An RCT shows a mean between-arm difference of 6 points, where the outcome is a continuous variable measured on a 100-point scale (the higher the score, the better the patient’s clinical state). The mean baseline-adjusted final score for those in the new drug arm was 6 points better than the final score in the other arm. The trial was considered positive because the sponsor and regulator had agreed, prior to conducting the trial, that a 5-point mean between-arm difference would be considered “clinically meaningful.”</p> <p>The researchers then performed an additional analysis, during which they “drilled down” into each arm of the trial, plotting how the score of each patient had changed from the beginning of the trial to the end. They found that the scores of 50% of patients in the new drug arm had changed by only 1 point, 2 points, 3 points, or 4 points over the course of the trial, while the scores of 10% of patients changed by 10 points or more following exposure to the new drug.</p> <p><strong>Key conceptual question that lies at the heart of the “responder analysis” controversy:</strong> Does this observation mean that we can infer that the drug “worked exceptionally well” in 10% of patients and “barely at all” for 50% of patients in the trial? Why or why not?</p> <p>I have my own opinion on the answer to this question and on the paper linked above. I’d be interested to hear the opinions of others on this site (with rationale presented).</p>
discourse.datamethods.org
February 7, 2026 at 9:00 PM
ABCLOC: bootstrap method for overfitting-corrected model performance metrics
@f2harrell, I want to apply this method in my project and I want to double-check if I understood your conclusions correctly: Statistical Thinking ### Bootstrap Confidence Limits for Bootstrap Overfitting-Corrected Model... The Efron-Gong optimism bootstrap has been used for decades to obtain reliable estimates of likely performance of statistical models on new data. It accomplishes this by estimating the bias (optimism) from overfitting and subtracting that bias from... The “sd2rev wtd4” method was the best. This method can be described as: * For each bootstrap iteration b=1,\dots,B, we computed a variate V_b: * V_b = bootstrap-sample performance (P_{boot}) - 1.25 \times original-sample performance (P_{test}) * Calculate the lower and upper standard deviations of these variates (SE_{lower}, SE_{upper}) using `Hmisc::dualSD` If possible, I am interested in constructing the 95% confidence intervals for the optimism-corrected estimate. How can we derive it from SE_{lower}, SE_{upper}? Maybe using the standard normal approximation -1.96 \times SE_{lower}, + 1.96 \times SE_{upper} ? This doesn’t seem correct though given the assymetrical assumtion of `dualSD`. Here is a simulated example applying the method above: ABCLOC "sd2rev wtd4" method · GitHub Is it correct? Thanks
discourse.datamethods.org
February 9, 2026 at 6:02 AM
FDA Draft Guidance: Use of Bayesian Methodology in Clinical Trials of Drug and Biological Products
<p>Pavlos the points you raised are the key ones to express extreme concern about in the EMA document. Behind the scenes is the fact that while frequentists are always asking Bayesians to simulate frequentist operating characteristics, Bayesian never ask frequentists to simulate Bayesian operating characteristics. Had this been commonplace, we would have ample demonstrations of why it’s a bad idea to control a transposed-conditional probability. This reminds me of an episode of <em>The Office</em> where one team of office workers wins a trivia contest at a party and the other team, of which the boss is a member, demands that to get the prize the first team must also win a “shoe toss over the roof” contest.</p> <p>Practically speaking, for many applications frequentist methods will be relatively OK when strong frequentist evidence for an effect is found, when one does not care about clinical significance. But the sample size needed by the frequentist method will be too large or with sequential testing the frequentist approach will take too long in achieving a result. If one is interesting in abandoning ineffective treatments early, frequentist methods will take <strong>way too long</strong>.</p>
discourse.datamethods.org
February 7, 2026 at 9:01 PM
Bayesian predictive projection for variable selection
<p>Projection is based on minimizing the KL-divergence from the reference model predictive distribution to the constrained model predictive distribution for each reference posterior draw separately. For many data model distributions, this is equivalent or can be approximated with optimization of the constrained model parameters given mean of the reference model prediction for each reference model posterior draw. In case of joint missing data imputation, each reference model posterior draw includes draw from the missing data distribution, too. The optimization approach to minimize KL does not work well for these latent data parameters. We could approximate by keeping the latent data parameters as fixed, and optimize only other parameters, but then this would be the same as using multiple imputation approach, which is a big task to add as discussed in <a href="https://github.com/stan-dev/projpred/issues/275#issuecomment-3820030806" rel="noopener nofollow ugc">a github issue</a>.</p> <p>In theory that KL-projection could be done for any model so that instead of doing the projection draw-by-draw, we would directly optimize to find the full projected distribution, but that task so complicated that we don’t know how to do it in reasonable computation time.</p>
discourse.datamethods.org
February 7, 2026 at 9:01 PM
FDA Draft Guidance: Use of Bayesian Methodology in Clinical Trials of Drug and Biological Products
<p>On a related note, learned today about a parallel (and much shorter) <a href="https://www.ema.europa.eu/en/documents/scientific-guideline/concept-paper-development-reflection-paper-use-bayesian-methods-clinical-development_en.pdf" rel="noopener nofollow ugc">concept paper from the EMA</a> outlining their planned approach to developing guidance on Bayesian methods in clinical trials. The public comment period closes <strong>April 30, 2026</strong> and those interested can submit via <a href="https://ec.europa.eu/eusurvey/" rel="noopener nofollow ugc">EUSurvey</a>.</p> <p>Given our ongoing discussion of the FDA draft guidance here, this provides a useful comparison point and an opportunity for the statistical community to provide input to EMA. From my understanding (correct me if I am wrong), the EMA document is not meant to be a full guidance but rather a description why EMA thinks additional guidance is needed and what topics/questions the future reflection paper should tackle.</p> <p>In contrast to the FDA document, the EMA framing suggests Bayesian methods require special justification rather than being recognized as a coherent inferential framework. The emphasis on “error control” and “lack of control of type I error rate” throughout the document suggests the EMA is still viewing Bayesian approaches through a fundamentally frequentist lens. This contrasts sharply with the FDA draft guidance’s recognition that there are legitimate Bayesian operating characteristics (probability of correct decisions, Bayesian power, expected bias and MSE of estimates averaged under a prior) that do not reduce to α-control.</p> <p>For example, the EMA paper asks: “How to assess error control for both primary and secondary endpoints in the absence of frequentist inference?” and “How to deal with lack of control of type I error rate?” The FDA guidance, by contrast, explicitly states that calibration to Type I error rate “may not be applicable or appropriate” in Bayesian settings and provides detailed discussion of alternative approaches to specifying success criteria.</p> <p>The proposed timeline extends to June 2028 for the final reflection paper representing a significant lag behind current FDA thinking and the broader methodological literature. Given that EMA is seeking feedback, this may be an opportunity to encourage alignment with the more sophisticated treatment in the FDA guidance. Specifically:</p> <ol> <li> <p>Recognition that Bayesian methods do not require special justification when the design and analysis are coherent.</p> </li> <li> <p>Acknowledgment of Bayesian-specific operating characteristics beyond Type I error.</p> </li> <li> <p>Guidance on prior distributions beyond just informative priors for borrowing.</p> </li> </ol> <p>Additional thoughts welcomed. Europe has exceptional statisticians and clinicians who have made foundational contributions to Bayesian methodology in clinical trials. Additionally, the existence of the FDA draft guidance may serve as a catalyst, providing both a template and external pressure for the EMA’s thinking to evolve. Particularly since the concept paper is a starting point for consultation rather than a final position.</p>
discourse.datamethods.org
February 7, 2026 at 5:31 AM
Bayesian predictive projection for variable selection
<aside class="quote no-group quote-modified"> <div class="title"> <div class="quote-controls"></div> <img alt="" class="avatar" height="24" src="https://discourse.datamethods.org/user_avatar/discourse.datamethods.org/arthur_albuquerque/48/3269_2.png" width="24" /> arthur_albuquerque:</div> <blockquote> <p>In the same post, Aki suggested:</p> <blockquote> <p>You can use it with multiple imputation by repeating the projection and variable selection for each imputed data set and combine the results in the end (this is the usual multiple imputation approach)</p> </blockquote> <p>I wonder how one would “combine the results in the end”.</p> </blockquote> </aside> <p>If the selected variables are all the same with different imputed datasets, then that’s it. If the selected variables are different with different imputed datasets, I would use majority voting and report the variation due to missing data uncertainty.</p> <aside class="quote no-group"> <div class="title"> <div class="quote-controls"></div> <img alt="" class="avatar" height="24" src="https://discourse.datamethods.org/user_avatar/discourse.datamethods.org/arthur_albuquerque/48/3269_2.png" width="24" /> arthur_albuquerque:</div> <blockquote> <p>There is an <a href="https://github.com/stan-dev/projpred/issues/275" rel="noopener nofollow ugc">opened issue</a> about this in GitHub, but no progress whatsoever.</p> </blockquote> </aside> <p>Changing projpred to support output of <code>brm_multiple()</code> is a big task, and unfortunately we have limited resources. I’d be happy to learn more about cases where the simpler approach would not be sufficient and then we could first experiment how much difference there would be if the search in the projpred would support <code>brm_multiple()</code> output.</p> <p>For the priors, recently we have been using R2D2 type priors more often than horsehoe especially with normal data models, see, e.g. <a href="https://users.aalto.fi/%5C~ave/casestudies/VariableSelection/student.html" rel="noopener nofollow ugc">https://users.aalto.fi/\~ave/casestudies/VariableSelection/student.html</a></p>
discourse.datamethods.org
February 5, 2026 at 8:17 AM