Chemoinformatic Characterization of Synthetic Screening Libraries Focused on Epigenetic Targets

23 September 2021, Version 1
This content is a preprint and has not undergone peer review at the time of posting.

Abstract

The importance of epigenetic drug and probe discovery is on the rise. This is not only paramount to identify and develop therapeutic treatments associated with epigenetic processes but also to understand the underlying epigenetic mechanisms involved in biological processes. To this end, chemical vendors have been developing synthetic compound libraries focused on epigenetic targets to increase the probabilities of identifying promising starting points for drug or probe candidates. However, the chemical contents of these data sets, the distribution of their physicochemical properties, and diversity remain unknown. To fill this gap and make this information available to the scientific community, we report a comprehensive analysis of eleven libraries focused on epigenetic targets containing more than 50,000 compounds. We used well-validated chemoinformatics approaches to characterize these sets, including novel methods such as automated detection of analog series and visual representations of the chemical space based on Constellation Plots and Extended Chemical Space Networks. This work will guide the efforts of experimental groups working on high-throughput and medium-throughput screening of epigenetic-focused libraries. The outcome of this work can also be used as a reference to design and describe novel focused epigenetic libraries.

Keywords

analog series
cheminformatics
Constellation Plots
drug discovery
Extended Chemical Space Networks

Supplementary materials

Title
Description
Actions
Title
Chemoinformatic Characterization of Synthetic Screening Libraries Focused on Epigenetic Targets
Description
Figure S1. Profile of six drug-like properties of pharmaceutical interest. Figure S2. Most frequent Bemis-Murcko scaffolds in all eleven compound epigenetic focused libraries. Table S1. Measures of scaffold diversity based on Bemis-Murcko: Area Under the Curve of the cyclic system recovery curve. Figure S3. Fingerprint-based diversity of the 11 data sets with RDKit and MACCS keys (116-bits) fingerprints with five metrics. Figure S4. Constellation Plots for each of the eleven data sets. Table S2. Twenty most representative compounds per compound library as calculated with RDKit fingerprints. Figure S5. Calculated synthetic accessibility profiling of the 11 compound epigenetic-focused libraries.
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.