"DDAP: docking domain affinity and biosynthetic pathway prediction tool for type I polyketide synthases"
Polyketide synthases (PKSs) are one of the most important classes of biosynthetic enzymes. Type I modular PKS (T1PKS) consists of a series of genes encoding multifunctional proteins, including a loading module and multiple extension modules. Each extension module is responsible for adding one acyl-monomer to the polyketide chain. The assembly order of polyketide substrates is not always coincident with gene cluster architecture in the bacterial genome. Therefore, finding the correct order of modules and substrates in the polyketide biosynthetic pathway is a crucial step in structure prediction. Previous research has demonstrated that the substrate assembly order is determined by cognate docking domain pairs (DDs) at the N-/C-terminus of PKS proteins. So far, relatively few studies have been reported on the functional and structural properties of DDs. Class I and Class II docking domains have been identified based on phylogenetic analysis, and their NMR or crystal structures solved. In 2009, Yadav et al. published a rule-based affinity prediction algorithm, based on a general assumption about the 6-deoxyerythronolide B synthase (DEBS) DD structure. This method is used by many well-known NP discovery tools including antiSMASH and NP.searcher, despite its several defects.
In this study, we present DDAP which is a tool for predicting the biosynthetic pathways of the products of type I modular polyketide synthase (PKS) with the focus on providing a more accurate prediction of the ordering of proteins and substrates in the pathway. In this study, the module docking domain (DD) affinity prediction performance on a hold-out testing data set reached AUC = 0.88; the MRR of pathway prediction reached 0.67. DDAP has advantages compared to previous informatics tools in several aspects: (i) it does not rely on large databases, making it a high efficiency tool, (ii) the predicted DD affinity is represented by a probability (0 to 1), which is more intuitive than raw scores, (iii) its performance is competitive compared to the current popular rule-based algorithm. To the best of our knowledge, DDAP is so far the first machine learning based algorithm for type I PKS pathway prediction. We also established the first database of type I modular PKSs, featuring a comprehensive annotation of available docking domains information in bacterial biosynthetic pathways.