Additional file 1: of Understanding the importance of key risk factors in predicting chronic bronchitic symptoms using a machine learning approach

Table S1. Comparison of gradient boosting models fit for all participants and all predictors, for 50 different random training sets. Table S2. Accuracy, sensitivity, and specificity of models fit separately with groups of risk factors for all participants, asthmatics, and non-asthmatics, for 50 different random holdout test datasets. Table S3. Average AUC of models trained on various groups of risk factors using data from all participants and validated separately by asthma status, for 50 random training sets. Table S4. Average area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, and specificity of models fit separately with groups of risk factors for non-asthmatics, non-asthmatics (rhinitis), and non-asthmatics (no rhinitis), for 50 different across- and within- participants holdout test datasets. Table S5. Comparison of gradient boosting models vs. logistic regression for all participants, asthmatics, and non-asthmatics averaged across 50 training sets. Table S6. Logistic regression results for all participants, asthmatics, and non-asthmatics for a random training set. Figure S1. Boxplot of relative influence, for 50 different random training sets, of the top 10 risk factors in models fit using all predictor variables for non-asthmatics, non-asthmatics (rhinitis), and non-asthmatics (no rhinitis). Figure S2. Area under the receiver operating characteristic curve (AUC) of the gradient boosting models and logistic regression model models fit separately with all risk factors and top 10 most important risk factors for 50 different random across-participant holdout test datasets. Figure S3. Area under the receiver operating characteristic curve (AUC) of the gradient boosting models and logistic regression models fit separately with all risk factors and top 10 most important risk factors for 50 different random within-participant holdout test datasets. (DOCX 3711 kb)