Performance metrics for models designed to predict treatment effect
Posted on 2023-07-09 - 03:21
Abstract Background Measuring the performance of models that predict individualized treatment effect is challenging because the outcomes of two alternative treatments are inherently unobservable in one patient. The C-for-benefit was proposed to measure discriminative ability. However, measures of calibration and overall performance are still lacking. We aimed to propose metrics of calibration and overall performance for models predicting treatment effect in randomized clinical trials (RCTs). Methods Similar to the previously proposed C-for-benefit, we defined observed pairwise treatment effect as the difference between outcomes in pairs of matched patients with different treatment assignment. We match each untreated patient with the nearest treated patient based on the Mahalanobis distance between patient characteristics. Then, we define the Eavg-for-benefit, E50-for-benefit, and E90-for-benefit as the average, median, and 90th quantile of the absolute distance between the predicted pairwise treatment effects and local-regression-smoothed observed pairwise treatment effects. Furthermore, we define the cross-entropy-for-benefit and Brier-for-benefit as the logarithmic and average squared distance between predicted and observed pairwise treatment effects. In a simulation study, the metric values of deliberately “perturbed models” were compared to those of the data-generating model, i.e., “optimal model”. To illustrate these performance metrics, different modeling approaches for predicting treatment effect are applied to the data of the Diabetes Prevention Program: 1) a risk modelling approach with restricted cubic splines; 2) an effect modelling approach including penalized treatment interactions; and 3) the causal forest. Results As desired, performance metric values of “perturbed models” were consistently worse than those of the “optimal model” (Eavg-for-benefit ≥ 0.043 versus 0.002, E50-for-benefit ≥ 0.032 versus 0.001, E90-for-benefit ≥ 0.084 versus 0.004, cross-entropy-for-benefit ≥ 0.765 versus 0.750, Brier-for-benefit ≥ 0.220 versus 0.218). Calibration, discriminative ability, and overall performance of three different models were similar in the case study. The proposed metrics were implemented in a publicly available R-package “HTEPredictionMetrics”. Conclusion The proposed metrics are useful to assess the calibration and overall performance of models predicting treatment effect in RCTs.
CITE THIS COLLECTION
DataCiteDataCite
3 Biotech3 Biotech
3D Printing in Medicine3D Printing in Medicine
3D Research3D Research
3D-Printed Materials and Systems3D-Printed Materials and Systems
4OR4OR
AAPG BulletinAAPG Bulletin
AAPS OpenAAPS Open
AAPS PharmSciTechAAPS PharmSciTech
Abhandlungen aus dem Mathematischen Seminar der Universität HamburgAbhandlungen aus dem Mathematischen Seminar der Universität Hamburg
ABI Technik (German)ABI Technik (German)
Academic MedicineAcademic Medicine
Academic PediatricsAcademic Pediatrics
Academic PsychiatryAcademic Psychiatry
Academic QuestionsAcademic Questions
Academy of Management DiscoveriesAcademy of Management Discoveries
Academy of Management JournalAcademy of Management Journal
Academy of Management Learning and EducationAcademy of Management Learning and Education
Academy of Management PerspectivesAcademy of Management Perspectives
Academy of Management ProceedingsAcademy of Management Proceedings
Academy of Management ReviewAcademy of Management Review
Maas, C. C. H. M.; Kent, D. M.; Hughes, M. C.; Dekker, R.; Lingsma, H. F.; van Klaveren, D. (2023). Performance metrics for models designed to predict treatment effect. figshare. Collection. https://doi.org/10.6084/m9.figshare.c.6733253.v1