"A machine learning based atom parameterization program for molecular mechanics force fields"
In recent decades, the use of computational tools has been prevalent in drug development. Molecular mechanics force fields (MMFFs) provide an atomistic representation of drug-target protein binding interactions and enable the elucidation of pertinent structural information necessary to evolve lead compounds into viable drug candidates through simulation. Molecular dynamics simulations allows researchers to regularly conduct computational simulations on complex structures containing tens of thousands of atoms for timescales of nano- to microseconds. Force fields that enable such simulations refer to the functional form and frequently large parameter sets utilized to calculate the energy of a system of atoms. While most protein force fields are well optimized, developing force field parameters for potential drug candidates featuring new chemical scaffolds can be challenging. The status quo as it pertains to the assignment and calculation of parameters is that all current approaches are specifically based on a precise set of physical principles of a given force field.
Consequently, as the database of synthetic compounds with potential therapeutic effects grows, it becomes increasingly difficult to parameterize such molecules in a manner that is consistent with the remainder of the force field. The research presented herein describes the preliminary work that addresses this problem by employing classical machine learning models (CML) in conjunction with the CHARMM General Force Field (CGenFF) for the development of a general pipeline that enables parameterization across MMFFs.
Random forest classification algorithms were used for the prediction of CGenFF atom types, labels that correspond to atomic environments. The inputs to the classification were “atomic fingerprints” (AFps), a custom definition of feature vectors that describe an atom’s geometric and chemical environment. Additionally, regression on the AFps was employed for the prediction of atomic partial charges. The current model was trained against 464 organic molecules (sample size of 44,302 unique AFps). Preliminary data shows good correlation between predicted parameters and true CGenFF parameters. Case studies for ongoing validation are offered to clarify the algorithm’s functionality and output data significance. Once model validation is complete, the generated pipeline will be expanded to include alternate MMFFs as a test of extensibility.