Materials Chemistry

ARC-MOF: A Diverse Database of Metal-Organic Frameworks with DFT-Derived Partial Atomic Charges and Descriptors for Machine Learning

Authors

Abstract

Metal-organic frameworks (MOFs) are a class of crystalline materials composed of metal nodes or clusters connected via semi-rigid organic linkers. Owing to their high surface area, porosity, and tunability, MOFs have received significant attention for numerous applications such as gas separation and storage. Atomistic simulations and data-driven methods (e.g., machine learning) have been successfully employed to screen large databases and successfully develop new experimentally synthesized and validated MOFs for CO2 capture. To enable data-driven materials discovery for any application, the first (and arguably most crucial) step is database curation. This work introduces the ab initio REPEAT charge MOF (ARC-MOF) database. This is a database of ~280,000 MOFs which have been either experimentally characterized or computationally generated, spanning all publicly available MOF databases. A key feature of ARC-MOF is that it contains DFT-derived electrostatic potential fitted partial atomic charges for each MOF. Additionally, ARC-MOF contains pre-computed descriptors for out-of-the-box machine learning applications. An in-depth analysis of the diversity of ARC-MOF with respect to the currently mapped design space of MOFs was performed – a critical, yet commonly overlooked aspect of previously reported MOF databases. Using this analysis, balanced subsets from ARC-MOF for various machine learning purposes have been identified. Other chemical and geometric diversity analyses are presented, with an analysis on the effect of charge assignment method on atomistic simulation of gas uptake in MOFs.

Content

Thumbnail image of ARC-MOF - A Diverse Database of Metal-Organic Frameworks with DFT-Derived Partial Atomic Charges and Descriptors for Machine Learning.pdf

Supplementary material

Thumbnail image of ARC-MOF Supporting Information.pdf
Supporting Information
Further information on the geometric properties, RAC descriptors, diversity analysis, REPEAT charges, and GCMC simulations