Advancing Small Molecule Physicochemical Property Predictions For Computational Drug Discovery
Computer-aided drug design approaches aim to guide the discovery of new chemical entities with optimal pharmaceutical and physicochemical properties. With computational predictions, vast libraries of virtual molecules can be evaluated based on their predicted values to aid prioritization of which new compounds to synthesize and test. Among others, protein target binding affinity, lipophilicity, membrane permeability, and solubility are desirable properties to predict. Modeling of these properties of drug candidates with physical modeling methodologies requires being able to accurately predict small-molecule protonation states and solvation. Therefore, evaluating the prediction accuracy of physicochemical properties such as acid dissociation constants and partition coefficients helps us learn how reliable these models are and anticipate how their inaccuracy can impact other physical models such as binding affinity predictions. Here, we present the results of two community-wide blind challenges in which we benchmark the accuracy of protonation state and lipophilicity predictions. We constructed new experimental datasets of experimental acid dissociation constants (pKa) and octanol-water partition coefficients (log P) of drug-like molecules to serve as reference benchmark sets to prospectively and fairly evaluate computational methods. We organized challenges to evaluate the accuracy of pKa and log P predictions to isolate prediction errors. Wide participation from the computational chemistry community allowed the assessment of a large variety of physical and empirical prediction methods. Small molecule drugs frequently possess multiple titratable and tautomerizable moieties that complicate pKa predictions. We designed a blind challenge to evaluate macroscopic and microscopic pKa predictions for drug-like molecules, based on full understanding of the experimental data and determined assessment strategies taking microspeciation in consideration. The detailed evaluation strategy that we demonstrated can guide future improvement of pKa prediction methods. We have determined that inaccuracies observed in current state of the art pKa predictions can cause significant errors in protein-ligand binding affinity predictions, both in terms of predicted protonation states and their relative solution energies. We have also determined that a number of chemical moieties are associated with higher pKa prediction errors. The complementary exercise of evaluating partition coefficient prediction methods allowed us the assess how accurately physical methods can capture the solvation of small organic molecules which can contribute to errors in protein-ligand binding affinity predictions. log P describes the solvation propensity of the neutral state of a small molecule between two phases. Among the many lessons learned from the log P blind challenge, we found that molecular mechanics-based methods were were not as accurate as quantum-mechanics-based or empirical prediction methods, and modest protocol variations caused large performance differences. We identified tautomer selection and charging protocols as areas that require more attention for improving log P predictions. We have also identified challenging sets of molecules for each method category. Lessons learned from these blind challenges on how to critically evaluate these physicochemical properties, the performance of current prediction methodologies, and their strengths and weaknesses will aid the community in improving these models. The road to more reliable computational methods for structure-based drug design passes through better modeling of small molecules, not just the protein target. Improving protonation, tautomerization, and solvation models are promising directions to achieve more accurate binding affinity predictions.