Abstract
The development of accurate machine learning models for molecular property prediction and materials design requires extensive high-quality training data. While databases of molecular structures and basic properties exist, comprehensive electronic structure data—including electronic densities, wavefunctions, and molecular orbitals—remain scarce at scale.
We present the OMol25 Electronic Structures dataset, an unprecedented open dataset of quantum chemical calculations designed to enable the development of next-generation physics-informed machine learning models for molecular chemistry and materials science.
The dataset comprises raw density functional theory (DFT) outputs, electronic densities, wavefunctions, and molecular orbital information from over 4 million high-accuracy quantum chemical calculations performed on diverse molecular systems ranging from small organic molecules to large biomolecular complexes.
This dataset will enable researchers to develop improved partial charges, partial spins, and advanced electronic features for machine learning models, potentially accelerating discoveries in drug design, catalyst development, and energy materials.
OMol25 Electronic Structure Dataset key use cases and partners
The Vision
We envision a future where researchers can rapidly design molecules and peptides to treat diseases, discover catalysts to revolutionize synthesis and manufacturing, identify the next electrolyte to store and transport energy to protect the grid, and more. But these breakthrough discoveries require data.
Data to train next-generation AI models and interatomic potentials. Data to push the boundaries of what's computationally possible in molecular chemistry and lead the world in AI for science. Data that captures the full complexity of chemical systems, from small organic molecules to massive biomolecular complexes.
About the Dataset
With our partners at Meta and Argonne Leadership Computing Facility (ALCF), we announce the OMol25 Electronic Structures dataset that includes ~500 TB of open molecular data. These data include the raw DFT outputs, electronic densities, wavefunctions, and molecular orbital information for over 4M high-accuracy quantum chemical calculations. We see this as a transformative opportunity to develop higher quality partial charges, partial spins, and advanced electronic features to unlock the next generation of physics-informed ML models.
The Materials Data Facility is proud to make these data available via the Eagle cluster at ALCF through a high-performance Globus endpoint. Given the dataset's unprecedented scale, we're first releasing all output data for a 4M random OMol25 split, with the full multi-petabyte dataset following based on community engagement.
What You Can Build
Explore the possibilities enabled by this comprehensive dataset
Train Advanced ML Models
Develop next-generation interatomic potentials and physics-informed models with unprecedented electronic structure data.
Molecular Property Prediction
Build databases of calculated properties, partial charges, and descriptors for molecular screening.
Catalyst Discovery
Accelerate the discovery of novel catalysts for synthesis, manufacturing, and energy applications.
Drug Design
Leverage quantum chemical calculations to design molecules and peptides for treating diseases.
Join the Community
For this first release, the data are quite raw, and as-created by the Meta team. There's a significant opportunity for the community to build tools that simplify access to these data, allow data query and browsing, create databases of calculated properties and descriptors, and much more. We intend to work on these topics with all of you.
We can't wait to see what you can do with these data!
Access the Data
Download the dataset and explore additional resources
Download Dataset
Direct access to dataset files and transfer instructions
Access to this dataset requires a free Globus account and joining a permission group (link below). Due to the dataset's size (500TB), we recommend using Globus Connect Personal or Server for high-speed transfer.
Additional Resources
New to Globus?
Create a free account to access high-performance data transfer for large datasets
Get Started with GlobusPartners & Contributors
Made possible through collaboration
Meta AI
Data Provider & Research Partner
Argonne Leadership Computing Facility (ALCF)
Infrastructure & Storage
Globus
Data Transfer Platform
University of Chicago
Research Collaboration
NIST
Funding & Support