Open-source QSAR models for pKa prediction using multiple machine learning approaches
Posted on 2019-09-19 - 04:09
Abstract Background The logarithmic acid dissociation constant pKa reflects the ionization of a chemical, which affects lipophilicity, solubility, protein binding, and ability to pass through the plasma membrane. Thus, pKa affects chemical absorption, distribution, metabolism, excretion, and toxicity properties. Multiple proprietary software packages exist for the prediction of pKa, but to the best of our knowledge no free and open-source programs exist for this purpose. Using a freely available data set and three machine learning approaches, we developed open-source models for pKa prediction. Methods The experimental strongest acidic and strongest basic pKa values in water for 7912 chemicals were obtained from DataWarrior, a freely available software package. Chemical structures were curated and standardized for quantitative structureâactivity relationship (QSAR) modeling using KNIME, and a subset comprising 79% of the initial set was used for modeling. To evaluate different approaches to modeling, several datasets were constructed based on different processing of chemical structures with acidic and/or basic pKas. Continuous molecular descriptors, binary fingerprints, and fragment counts were generated using PaDEL, and pKa prediction models were created using three machine learning methods, (1) support vector machines (SVM) combined with k-nearest neighbors (kNN), (2) extreme gradient boosting (XGB) and (3) deep neural networks (DNN). Results The three methods delivered comparable performances on the training and test sets with a root-mean-squared error (RMSE) around 1.5 and a coefficient of determination (R2) around 0.80. Two commercial pKa predictors from ACD/Labs and ChemAxon were used to benchmark the three best models developed in this work, and performance of our models compared favorably to the commercial products. Conclusions This work provides multiple QSAR models to predict the strongest acidic and strongest basic pKas of chemicals, built using publicly available data, and provided as free and open-source software on GitHub.
CITE THIS COLLECTION
DataCite
3 Biotech
3D Printing in Medicine
3D Research
3D-Printed Materials and Systems
4OR
AAPG Bulletin
AAPS Open
AAPS PharmSciTech
Abhandlungen aus dem Mathematischen Seminar der Universität Hamburg
ABI Technik (German)
Academic Medicine
Academic Pediatrics
Academic Psychiatry
Academic Questions
Academy of Management Discoveries
Academy of Management Journal
Academy of Management Learning and Education
Academy of Management Perspectives
Academy of Management Proceedings
Academy of Management Review
Mansouri, Kamel; Cariello, Neal; Korotcov, Alexandru; Tkachenko, Valery; Grulke, Chris; Sprankle, Catherine; et al. (2019). Open-source QSAR models for pKa prediction using multiple machine learning approaches. figshare. Collection. https://doi.org/10.6084/m9.figshare.c.4670981.v1
or
Select your citation style and then place your mouse over the citation text to select it.
SHARE
Usage metrics
Read the peer-reviewed publication
AUTHORS (10)
KM
Kamel Mansouri
NC
Neal Cariello
AK
Alexandru Korotcov
VT
Valery Tkachenko
CG
Chris Grulke
CS
Catherine Sprankle
DA
David Allen
WC
Warren Casey
NK
Nicole Kleinstreuer
AW
Antony Williams