Prediction of Biological Activities of Volatile Metabolites Using Molecular Fingerprints and Machine Learning Methods
Keywords:
Biological Activities, Fingerprints, Machine Learning, Volatile Metabolites,Abstract
Volatile metabolites are small molecules, comprise a diverse chemical group with various biological activities and have high vapor pressures under ambient conditions. It is crucial to determine the biological activities of volatile metabolites as they play important roles in chemical ecology and human healthcare. In this study, we have accumulated 341 volatiles emitted by biological species associated with 11 types of biological activities and deposited the data into our database, which is called KNApSAcK Metabolite Ecology Database. Using this dataset, we have developed 72 classification models to predict biological activities of volatile metabolites by using various machine learning methods. Eight types of molecular fingerprints were used to represent the molecules, which are PubChem (881 bits), CDK (1024 bits), Extended CDK (1024bits), MACCS (166 bits), Klekota-Roth (4860 bits), Substructure (307 bits), Estate (79 bits), and atom pairs (780 bits). A new type of fingerprint was also proposed by combining all features of these eight fingerprints (Combine, 9121 bits). The best classification model was developed by our proposed fingerprint (Combine, 9121 bits) trained with gradient boosting method algorithm (GBM) with predictive accuracy at 94.43%. The results indicated that molecular fingerprints and machine learning methods could be useful for predicting biological activities of volatile metabolites.Downloads
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)