Chemeleon-DNG Tutorial: Crystal Structure Prediction with AI#

Open In Colab

Overview#

This tutorial demonstrates how to use Chemeleon-DNG (Diffusion Network Generator) to discover new crystal structures using AI-powered generative models. You will learn to:

  1. Screen for chemically valid compositions

  2. Generate 3D crystal structures using diffusion models

  3. Optimize structures with machine learning force fields

  4. Evaluate stability against known materials

What is Chemeleon-DNG?
Chemeleon-DNG is a diffusion-based generative model for crystal structure prediction (CSP) and de novo generation (DNG). CSP involves predicting stable crystal structures for given compositions, while DNG focuses on generating without predefined compositions.

Prerequisites:


Step 0: Environment Setup#

Before running the tutorial, you need to install the required packages and set up your Materials Project API key.

Installation#

Run the following cell to install all dependencies. This may take 5-10 minutes.

Note: It’s recommended to use a virtual environment to avoid conflicts with existing packages.

! uv pip install chemeleon-dng==0.1.3 smact==3.2.0 mace-torch==0.3.14 torch-sim-atomistic==0.3.0 mp_api==0.45.13
/bin/bash: line 1: uv: command not found

Materials Project API Key Setup#

For Step 4 (stability analysis), you’ll need a free API key from Materials Project:

  1. Register at https://next-gen.materialsproject.org/api

  2. Copy your API key

  3. Set it as an environment variable or directly in the code below

import os

# Option 1: Set your API key as an environment variable (recommended)
# export MP_API_KEY="your_api_key_here"  # Run in terminal before starting Jupyter

# Option 2: Set directly in code (not recommended for shared notebooks)
# os.environ["MP_API_KEY"] = "your_api_key_here"

# Verify API key is set
if os.getenv("MP_API_KEY"):
    print("✓ Materials Project API key is configured")
else:
    print(
        "⚠ Warning: MP_API_KEY not set. You'll need this for Step 4 "
        "(Stability Analysis)"
    )
⚠ Warning: MP_API_KEY not set. You'll need this for Step 4 (Stability Analysis)

Step 1: Composition Screening with SMACT#

Goal: Identify chemically valid compositions from element combinations.

Background:
Not all combinations of elements form stable compounds. SMACT (Semiconducting Materials from Analogy and Chemical Theory) uses charge neutrality and electronegativity rules to filter out unlikely compositions.

Key Concepts:

  • Stoichiometry: The ratio of elements in a compound (e.g., H₂O has 2:1 ratio)

  • Charge Neutrality: Positive and negative charges must balance

  • Commonality: How frequently oxidation states occur in nature

    • low: Includes rare oxidation states (more permissive)

    • medium: Common oxidation states only (recommended)

    • high: Very common oxidation states (most restrictive)

Learn More:


import itertools
from pprint import pprint

from pymatgen.core import Composition
from smact.screening import smact_validity
# Define the chemical system to explore
# Example: Zn-Ti-O system (potential photocatalysts/semiconductors)
elements = ["Zn", "Ti", "O"]
def get_unique_valid_compositions(
    elements: list[str],
    max_stoichiometry: int = 8,
    commonality: str = "medium",
) -> list[Composition]:
    """Generate and filter chemically valid compositions.

    Args:
        elements: List of element symbols (e.g., ["Zn", "Ti", "O"])
        max_stoichiometry: Maximum number of atoms per element (1 to this value)
        commonality: Oxidation state filter ("low", "medium", or "high")

    Returns:
        List of valid Composition objects that pass SMACT screening
    """
    # Generate all possible stoichiometric combinations
    all_compositions = [
        Composition({el: amt for el, amt in zip(elements, amt_list, strict=True)})
        for amt_list in itertools.product(
            range(1, max_stoichiometry + 1), repeat=len(elements)
        )
    ]

    # Filter using SMACT validity criteria
    valid_compositions = [
        comp
        for comp in all_compositions
        if smact_validity(comp, commonality=commonality)
    ]

    print(
        f"Found {len(valid_compositions)} valid compositions "
        f"out of {len(all_compositions)} total combinations"
    )
    return valid_compositions

Example 1: Permissive Screening (Low Commonality)#

Let’s first try with commonality="low" which includes rare oxidation states. This will give us more candidates but may include unlikely compositions.

# Screen with low commonality (more permissive)
# With 5^3 = 125 total combinations, many will pass the filter
compositions_low = get_unique_valid_compositions(
    elements=["Zn", "Ti", "O"], max_stoichiometry=5, commonality="low"
)
pprint(compositions_low)
Found 52 valid compositions out of 125 total combinations
[Composition('Zn1 Ti1 O1'),
 Composition('Zn1 Ti1 O2'),
 Composition('Zn1 Ti1 O3'),
 Composition('Zn1 Ti1 O4'),
 Composition('Zn1 Ti1 O5'),
 Composition('Zn1 Ti2 O2'),
 Composition('Zn1 Ti2 O3'),
 Composition('Zn1 Ti2 O4'),
 Composition('Zn1 Ti2 O5'),
 Composition('Zn1 Ti3 O2'),
 Composition('Zn1 Ti3 O3'),
 Composition('Zn1 Ti3 O4'),
 Composition('Zn1 Ti3 O5'),
 Composition('Zn1 Ti4 O3'),
 Composition('Zn1 Ti4 O4'),
 Composition('Zn1 Ti4 O5'),
 Composition('Zn1 Ti5 O3'),
 Composition('Zn1 Ti5 O4'),
 Composition('Zn2 Ti1 O2'),
 Composition('Zn2 Ti1 O3'),
 Composition('Zn2 Ti1 O4'),
 Composition('Zn2 Ti1 O5'),
 Composition('Zn2 Ti2 O2'),
 Composition('Zn2 Ti2 O3'),
 Composition('Zn2 Ti2 O4'),
 Composition('Zn2 Ti2 O5'),
 Composition('Zn2 Ti3 O4'),
 Composition('Zn2 Ti3 O5'),
 Composition('Zn2 Ti4 O3'),
 Composition('Zn2 Ti4 O4'),
 Composition('Zn2 Ti4 O5'),
 Composition('Zn3 Ti1 O2'),
 Composition('Zn3 Ti1 O3'),
 Composition('Zn3 Ti1 O4'),
 Composition('Zn3 Ti1 O5'),
 Composition('Zn3 Ti2 O4'),
 Composition('Zn3 Ti2 O5'),
 Composition('Zn3 Ti3 O3'),
 Composition('Zn3 Ti4 O5'),
 Composition('Zn3 Ti5 O4'),
 Composition('Zn4 Ti1 O3'),
 Composition('Zn4 Ti1 O4'),
 Composition('Zn4 Ti1 O5'),
 Composition('Zn4 Ti2 O3'),
 Composition('Zn4 Ti2 O4'),
 Composition('Zn4 Ti2 O5'),
 Composition('Zn4 Ti3 O5'),
 Composition('Zn4 Ti4 O4'),
 Composition('Zn5 Ti1 O3'),
 Composition('Zn5 Ti1 O4'),
 Composition('Zn5 Ti3 O4'),
 Composition('Zn5 Ti5 O5')]

Observation: With low commonality, we get 52 valid compositions out of 125 total combinations. This includes many compositions that may not be thermodynamically stable.



Step 2: Crystal Structure Generation with Chemeleon-DNG#

Goal: Generate 3D atomic structures for our valid compositions.

Background:
Crystal Structure Prediction (CSP) is the task of predicting how atoms arrange in 3D space given only a chemical formula. Chemeleon-DNG uses a diffusion model (similar to image generation AI) to generate realistic crystal structures.

How it works:

  1. Start with random atomic positions

  2. Iteratively refine positions using the trained diffusion model

  3. Generate multiple diverse structures for each composition

Key Parameters:

  • task="csp": Crystal Structure Prediction mode (vs “dng” for decorated graphs)

  • formulas: List of chemical formulas to generate structures for

  • num_samples=10: Generate 10 different structures per formula

  • batch_size=100: Process 100 structures simultaneously (adjust based on memory)

  • device="cpu": Use CPU (change to “cuda” for GPU acceleration)

Output:

  • CIF files: Standard format for crystal structures

  • JSON file: Structures in pymatgen format for further analysis

  • ASE Atoms objects: For visualization and manipulation


from ase.visualize import view

from chemeleon_dng.sample import sample
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[8], line 3
      1 from ase.visualize import view
----> 3 from chemeleon_dng.sample import sample

ModuleNotFoundError: No module named 'chemeleon_dng'
# Convert compositions to formula strings
formulas = [str(comp) for comp in unique_valid_compositions]
print(f"Generating structures for {len(formulas)} formulas: {formulas}\n")

# Generate crystal structures using Chemeleon-DNG
# This will download the pre-trained model checkpoint (~500MB) on first run
gen_atoms_list = sample(
    task="csp",  # Crystal Structure Prediction mode
    formulas=formulas,  # Our 4 validated compositions
    batch_size=100,  # Process all 40 structures (4 formulas × 10 samples) at once
    num_samples=10,  # Generate 10 diverse structures per formula
    output_dir="results/tutorials",  # Save results here
    device="cpu",  # Use CPU (change to "cuda" for GPU if available)
)

print(f"\n✓ Successfully generated {len(gen_atoms_list)} crystal structures!")
Generating structures for 4 formulas: ['Zn1 Ti1 O3', 'Zn1 Ti2 O5', 'Zn2 Ti1 O4', 'Zn3 Ti1 O5']
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[9], line 7
      3 print(f"Generating structures for {len(formulas)} formulas: {formulas}\n")
      5 # Generate crystal structures using Chemeleon-DNG
      6 # This will download the pre-trained model checkpoint (~500MB) on first run
----> 7 gen_atoms_list = sample(
      8     task="csp",  # Crystal Structure Prediction mode
      9     formulas=formulas,  # Our 4 validated compositions
     10     batch_size=100,  # Process all 40 structures (4 formulas × 10 samples) at once
     11     num_samples=10,  # Generate 10 diverse structures per formula
     12     output_dir="results/tutorials",  # Save results here
     13     device="cpu",  # Use CPU (change to "cuda" for GPU if available)
     14 )
     16 print(f"\n✓ Successfully generated {len(gen_atoms_list)} crystal structures!")

NameError: name 'sample' is not defined

Visualize Generated Structures#

Let’s visualize the first generated structure using an interactive 3D viewer. You can:

  • Rotate the view by dragging with mouse

  • Zoom with scroll wheel

  • Use the dropdown to show/hide specific elements

# Visualize the first generated structure (index 0)
# This shows the raw output from the diffusion model before optimization
view(gen_atoms_list[0], viewer="x3d")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[10], line 3
      1 # Visualize the first generated structure (index 0)
      2 # This shows the raw output from the diffusion model before optimization
----> 3 view(gen_atoms_list[0], viewer="x3d")

NameError: name 'gen_atoms_list' is not defined

Step 3: Geometry Optimization with TorchSim#

Goal: Relax the generated structures to their minimum energy configurations.

Background:
The structures from Step 2 are initial predictions that may have unrealistic bond lengths or atomic positions. Geometry optimization adjusts atomic positions and cell parameters to minimize the total energy, resulting in more physically realistic structures.

Why optimize?

  • Generated structures may have atoms too close or too far apart

  • Optimization finds the nearest local energy minimum

  • Lower energy = more stable structure

  • Essential for accurate property predictions

Tools:

  • TorchSim: Atomic simulation framework for PyTorch

  • MACE: Machine learning interatomic potential (force field)

    • Pre-trained on Materials Project database

    • Orders of magnitude faster than DFT

    • Reasonable accuracy for structure relaxation

What to expect:

  • Energy values in eV (lower = more stable)

  • Atomic positions will shift slightly

  • Cell parameters may change

  • Some structures may relax to similar configurations

Learn more: TorchSim/torch-sim


import torch_sim as ts
from mace.calculators.foundations_models import mace_mp
from torch_sim.models.mace import MaceModel

# Load the MACE-MP "small" model (trained on Materials Project data)
# Model size: small (~5M parameters, faster) vs medium/large (more accurate)
mace = mace_mp(model="small", return_raw_model=True)
mace_model = MaceModel(model=mace, device="cpu")

print("✓ MACE model loaded successfully")
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[11], line 1
----> 1 import torch_sim as ts
      2 from mace.calculators.foundations_models import mace_mp
      3 from torch_sim.models.mace import MaceModel

ModuleNotFoundError: No module named 'torch_sim'
# Perform geometry optimization on all 40 generated structures
# Optimizer: Frechet cell FIRE (allows both atomic and cell relaxation)
print(f"Optimizing {len(gen_atoms_list)} structures...\n")

relaxed_state = ts.optimize(
    system=gen_atoms_list,
    model=mace_model,
    optimizer=ts.optimizers.frechet_cell_fire,  # Allows cell shape to change
)

# Display energies for all relaxed structures (in eV)
print("\nRelaxed energies (eV):")
print(relaxed_state.energy)

print(
    f"\nEnergy range: {relaxed_state.energy.min():.2f} to "
    f"{relaxed_state.energy.max():.2f} eV"
)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[12], line 3
      1 # Perform geometry optimization on all 40 generated structures
      2 # Optimizer: Frechet cell FIRE (allows both atomic and cell relaxation)
----> 3 print(f"Optimizing {len(gen_atoms_list)} structures...\n")
      5 relaxed_state = ts.optimize(
      6     system=gen_atoms_list,
      7     model=mace_model,
      8     optimizer=ts.optimizers.frechet_cell_fire,  # Allows cell shape to change
      9 )
     11 # Display energies for all relaxed structures (in eV)

NameError: name 'gen_atoms_list' is not defined
# Convert optimized structures back to ASE Atoms format for visualization
gen_atoms_list_relaxed = relaxed_state.to_atoms()
print(f"✓ Converted {len(gen_atoms_list_relaxed)} optimized structures")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[13], line 2
      1 # Convert optimized structures back to ASE Atoms format for visualization
----> 2 gen_atoms_list_relaxed = relaxed_state.to_atoms()
      3 print(f"✓ Converted {len(gen_atoms_list_relaxed)} optimized structures")

NameError: name 'relaxed_state' is not defined

Compare Before and After Optimization#

Let’s visualize the same structure before and after optimization to see how the atomic positions and cell shape have changed.

# BEFORE optimization: Raw structure from diffusion model
view(gen_atoms_list[0], viewer="x3d")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[14], line 2
      1 # BEFORE optimization: Raw structure from diffusion model
----> 2 view(gen_atoms_list[0], viewer="x3d")

NameError: name 'gen_atoms_list' is not defined
# AFTER optimization: Relaxed to minimum energy configuration
# Notice how atomic positions and possibly cell shape have changed
view(gen_atoms_list_relaxed[0], viewer="x3d")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[15], line 3
      1 # AFTER optimization: Relaxed to minimum energy configuration
      2 # Notice how atomic positions and possibly cell shape have changed
----> 3 view(gen_atoms_list_relaxed[0], viewer="x3d")

NameError: name 'gen_atoms_list_relaxed' is not defined

Step 4: Stability Analysis with Phase Diagrams#

Goal: Evaluate which generated structures are thermodynamically stable.

Background:
Not all structures with reasonable energy are stable. A material is thermodynamically stable if it’s the lowest energy configuration compared to all possible decomposition routes. We use phase diagrams to assess stability against the Materials Project database (45,000+ known materials).

Key Concepts:

Phase Diagram:
A map showing the most stable phases in a chemical system. Points on the convex hull represent stable compounds; points above it are metastable or unstable.

Energy Above Hull (E_hull):
The energy difference between a structure and the most stable configuration.

  • E_hull = 0 eV/atom: Thermodynamically stable (on the hull)

  • E_hull < 0.1 eV/atom: Likely synthesizable and stable

  • E_hull > 0.1 eV/atom: Unlikely to be stable (may decompose)

Why compare against Materials Project?
We need to check if our generated structures are:

  1. More stable than known phases (novel materials!)

  2. Close to the hull (potentially synthesizable)

  3. Duplicates of existing materials (validation of our method)

Note: We use uncorrected energies because MACE predictions are uncorrected, while MP applies various corrections (GGA+U, etc.).

Learn more:


import os

from mp_api.client import MPRester
from pymatgen.analysis.phase_diagram import PDEntry, PDPlotter, PhaseDiagram
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[16], line 3
      1 import os
----> 3 from mp_api.client import MPRester
      4 from pymatgen.analysis.phase_diagram import PDEntry, PDPlotter, PhaseDiagram

ModuleNotFoundError: No module named 'mp_api'
# Get Materials Project API key from environment variable
mp_api = os.getenv("MP_API_KEY")

# Verify API key is set (should have been configured in Step 0)
assert mp_api is not None, (
    "MP_API_KEY not found! Please set your Materials Project API key "
    "in Step 0 or as an environment variable."
)

# Initialize Materials Project API client
mpr = MPRester(api_key=mp_api)
print("✓ Successfully connected to Materials Project API")
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[17], line 5
      2 mp_api = os.getenv("MP_API_KEY")
      4 # Verify API key is set (should have been configured in Step 0)
----> 5 assert mp_api is not None, (
      6     "MP_API_KEY not found! Please set your Materials Project API key "
      7     "in Step 0 or as an environment variable."
      8 )
     10 # Initialize Materials Project API client
     11 mpr = MPRester(api_key=mp_api)

AssertionError: MP_API_KEY not found! Please set your Materials Project API key in Step 0 or as an environment variable.

Fetch Known Materials from Materials Project#

We’ll retrieve all known materials in the Zn-Ti-O chemical system from the Materials Project database.

# Retrieve all materials in the Zn-Ti-O chemical system from Materials Project
# Filter for GGA/GGA+U calculations (standard DFT methods)
mp_entries_raw = mpr.get_entries_in_chemsys(
    elements=elements,
    additional_criteria={"thermo_types": ["GGA_GGA+U"]},
)
print(f"Retrieved {len(mp_entries_raw)} materials from Materials Project")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[18], line 3
      1 # Retrieve all materials in the Zn-Ti-O chemical system from Materials Project
      2 # Filter for GGA/GGA+U calculations (standard DFT methods)
----> 3 mp_entries_raw = mpr.get_entries_in_chemsys(
      4     elements=elements,
      5     additional_criteria={"thermo_types": ["GGA_GGA+U"]},
      6 )
      7 print(f"Retrieved {len(mp_entries_raw)} materials from Materials Project")

NameError: name 'mpr' is not defined
# Extract uncorrected energies for fair comparison with MACE predictions
# (MACE energies don't include MP corrections like GGA+U adjustments)
mp_entries = [
    PDEntry(composition=entry.composition, energy=entry.uncorrected_energy)
    for entry in mp_entries_raw
]
print(f"Converted {len(mp_entries)} entries to uncorrected energies")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[19], line 5
      1 # Extract uncorrected energies for fair comparison with MACE predictions
      2 # (MACE energies don't include MP corrections like GGA+U adjustments)
      3 mp_entries = [
      4     PDEntry(composition=entry.composition, energy=entry.uncorrected_energy)
----> 5     for entry in mp_entries_raw
      6 ]
      7 print(f"Converted {len(mp_entries)} entries to uncorrected energies")

NameError: name 'mp_entries_raw' is not defined

Visualize Materials Project Phase Diagram#

This shows all known stable materials in the Zn-Ti-O system. Points on the convex hull (lower boundary) are thermodynamically stable.

# Construct phase diagram from Materials Project entries
phase_diagram = PhaseDiagram(mp_entries)

# Plot the phase diagram (ternary diagram for 3-element systems)
pd_plotter = PDPlotter(phase_diagram)
pd_plotter.get_plot().show()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[20], line 2
      1 # Construct phase diagram from Materials Project entries
----> 2 phase_diagram = PhaseDiagram(mp_entries)
      4 # Plot the phase diagram (ternary diagram for 3-element systems)
      5 pd_plotter = PDPlotter(phase_diagram)

NameError: name 'PhaseDiagram' is not defined

Prepare Our Generated Structures for Comparison#

Convert our relaxed structures to phase diagram entries so we can compare them against known materials.

# Extract compositions and energies from our relaxed structures
mp_entries_composition = [st.composition for st in relaxed_state.to_structures()]
my_entries_energy = relaxed_state.energy

# Create phase diagram entries for our generated structures
my_entries = [
    PDEntry(composition=comp, energy=e.item())
    for comp, e in zip(mp_entries_composition, my_entries_energy, strict=True)
]

print(f"Created {len(my_entries)} entries from our generated structures")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[21], line 2
      1 # Extract compositions and energies from our relaxed structures
----> 2 mp_entries_composition = [st.composition for st in relaxed_state.to_structures()]
      3 my_entries_energy = relaxed_state.energy
      5 # Create phase diagram entries for our generated structures

NameError: name 'relaxed_state' is not defined

Identify Stable Materials in Materials Project#

Let’s check which Materials Project entries are on or very close to the convex hull (E_hull < 0.1 eV/atom).

# List stable and nearly-stable entries from Materials Project
print("Stable materials in Zn-Ti-O system (E_hull < 0.1 eV/atom):\n")

for i, entry in enumerate(mp_entries):
    energy_above_hull = phase_diagram.get_e_above_hull(entry, allow_negative=True)
    if energy_above_hull < 0.1:
        print(
            f"{i}: {entry.composition.reduced_formula:15s} "
            f"E_hull = {energy_above_hull:.4f} eV/atom"
        )
Stable materials in Zn-Ti-O system (E_hull < 0.1 eV/atom):
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[22], line 4
      1 # List stable and nearly-stable entries from Materials Project
      2 print("Stable materials in Zn-Ti-O system (E_hull < 0.1 eV/atom):\n")
----> 4 for i, entry in enumerate(mp_entries):
      5     energy_above_hull = phase_diagram.get_e_above_hull(entry, allow_negative=True)
      6     if energy_above_hull < 0.1:

NameError: name 'mp_entries' is not defined

Phase Diagram with Our Generated Structures#

Now let’s add our generated structures to the phase diagram and see how they compare! Our structures will appear as additional points on the plot.

# Combine Materials Project entries with our generated structures
all_entries_with_my = mp_entries + my_entries

# Construct new phase diagram including our structures
phase_diagram_with_my = PhaseDiagram(all_entries_with_my)

# Plot the combined phase diagram
# Look for new points that may be on or close to the convex hull!
pd_plotter_with_my = PDPlotter(phase_diagram_with_my)
pd_plotter_with_my.get_plot().show()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[23], line 2
      1 # Combine Materials Project entries with our generated structures
----> 2 all_entries_with_my = mp_entries + my_entries
      4 # Construct new phase diagram including our structures
      5 phase_diagram_with_my = PhaseDiagram(all_entries_with_my)

NameError: name 'mp_entries' is not defined

Evaluate Our Generated Structures#

Let’s check which of our generated structures are stable or nearly stable (E_hull < 0.1 eV/atom).

# Analyze stability of our generated structures
print("Stability analysis of our generated structures:\n")

stable_count = 0
for i, entry in enumerate(my_entries):
    e_hull = phase_diagram_with_my.get_e_above_hull(entry, allow_negative=True)

    # Only print structures that are reasonably close to stable
    if e_hull < 0.2:
        stability = "✓ STABLE" if e_hull < 0.1 else "○ Nearly stable"
        print(
            f"{i:2d}: {entry.composition.reduced_formula:15s} "
            f"E_hull = {e_hull:6.3f} eV/atom  {stability}"
        )
        if e_hull < 0.1:
            stable_count += 1

print(f"\n{'=' * 60}")
print(
    f"Summary: {stable_count}/{len(my_entries)} generated structures "
    f"are thermodynamically stable!"
)
Stability analysis of our generated structures:
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[24], line 5
      2 print("Stability analysis of our generated structures:\n")
      4 stable_count = 0
----> 5 for i, entry in enumerate(my_entries):
      6     e_hull = phase_diagram_with_my.get_e_above_hull(entry, allow_negative=True)
      8     # Only print structures that are reasonably close to stable

NameError: name 'my_entries' is not defined

Summary and Conclusions#

What We Accomplished#

In this tutorial, we explored the complete workflow for AI-powered crystal structure discovery:

  1. Composition Screening (SMACT)

    • Started with 125 possible Zn-Ti-O combinations

    • Filtered to 4 chemically valid compositions using medium commonality

  2. Structure Generation (Chemeleon-DNG)

    • Generated 40 crystal structures (10 per composition)

    • Used diffusion models trained on crystallographic data

    • Obtained diverse structural candidates in seconds

  3. Geometry Optimization (MACE + TorchSim)

    • Relaxed all structures to minimum energy configurations

    • Used ML force fields for fast, accurate optimization

    • Obtained realistic bond lengths and lattice parameters

  4. Stability Analysis (Phase Diagrams)

    • Compared against 45,000+ known materials from Materials Project

    • Identified thermodynamically stable structures

    • Evaluated feasibility for synthesis

Key Takeaways#

  • AI-Accelerated Discovery: Diffusion models can rapidly generate diverse crystal structures from chemical formulas alone

  • Multi-Scale Validation: Combining chemical rules (SMACT), generative AI (Chemeleon), ML force fields (MACE), and thermodynamics (phase diagrams) provides robust structure prediction

  • Computational Efficiency: What would take days of DFT calculations can now be done in minutes with ML models

  • Success Metrics: Finding even a few stable structures out of 40 candidates represents successful materials discovery


Next Steps and Advanced Exploration#

Immediate Extensions#

  1. Explore Different Compositions:

    • Try different element combinations (e.g., transition metal oxides, nitrides)

    • Adjust max_stoichiometry for more complex compositions

    • Experiment with commonality settings

  2. Generate More Structures:

    • Increase num_samples for better coverage of structure space

    • Use GPU (device="cuda") for faster generation

  3. Detailed Analysis:

    • Calculate additional properties (band gap, elastic moduli)

    • Perform phonon calculations to check dynamic stability

    • Compare specific structures against experimental data

Advanced Topics#

  1. Structure Analysis:

    • Use pymatgen’s StructureMatcher to identify duplicate structures

    • Analyze space groups and symmetry

    • Calculate coordination environments

  2. High-Throughput Screening:

    • Screen multiple chemical systems simultaneously

    • Build databases of generated structures

    • Implement automated filtering pipelines

  3. DFT Validation:

    • Select top candidates for DFT calculations (VASP, Quantum ESPRESSO)

    • Compare ML energies with DFT energies

    • Calculate accurate electronic properties

  4. Experimental Validation:

    • Predict X-ray diffraction patterns

    • Compare with ICSD or experimental databases

    • Identify synthesis routes

Resources#

Troubleshooting#

Common Issues:

  • GPU Memory Error: Reduce batch_size or use device="cpu"

  • API Key Error: Ensure MP_API_KEY is set correctly in Step 0

  • Import Errors: Verify all packages are installed with correct versions

  • Long Generation Time: Expected on CPU; consider using GPU for faster results

Getting Help:

  • Check the GitHub repository for issues and discussions

  • Read the documentation for each tool

  • Join the materials informatics community forums


Congratulations!#

You’ve completed the Chemeleon-DNG tutorial and learned how to use AI for crystal structure discovery. You now have the tools to explore vast regions of chemical space and discover new materials computationally.

Happy materials discovery! 🔬🎉