Deep generative model of constructing chemical latent space for large molecular structures with 3D complexity

29 May 2023, Version 2
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

The structural diversity of chemical libraries, which are systematic collections of compounds that have potential to bind to biomolecules, can be represented by chemical latent space. A chemical latent space is a projection of a compound structure into a mathematical space based on several molecular features, and it can express structural diversity within a compound library in order to explore a broader chemical space and generate novel compound structures for drug candidates. Many of the natural compounds produced by living organisms have complicated structures and are highly biologically active. However, no deep learning models exist that can effectively construct chemical latent spaces to handle large and complex compound structures, such as those found in natural compounds, and furthermore manage chirality, which is an essential factor in the 3D complexity of compounds. In this study, we developed a new deep-learning method, called NP-VAE, based on variational autoencoder for handling natural compounds, and constructed a chemical latent space that projected large and complex compound structures including chirality. NP-VAE was successful in construction of the chemical latent space that showed higher accuracy with respect to reconstruction and generalization than the state-of-the-art deep learning methods. Furthermore, by exploring the acquired latent space, we succeeded in comprehensively analyzing a compound library containing natural compounds and generating novel compound structures with optimized functions.

Keywords

chemical latent space
generative model
variational autoencoder (VAE)
natural product
stereochemistry
chemical space
deep learning

Supplementary materials

Title
Description
Actions
Title
Supplemental Method, Tables and Figures
Description
Supplemental Method: NP-VAE algorithm. Supplemental Table S1. Supplemental Figures S1-S6
Actions

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.