Compositional Screening: From Chemical Space to Discovery#
In this tutorial, we’ll systematically explore the Cu-Ti-O chemical system to demonstrate how compositional screening works in practice. You’ll learn to:
Generate all possible compositions in a chemical system
Apply chemical filters to identify viable candidates
Compare with experimental reality using the Materials Project
Visualise results using ternary phase diagrams
Identify promising targets for experimental synthesis
The Big Picture#
Compositional screening answers the question: “In a given chemical system, what compositions are chemically feasible, and which ones have actually been made?”
The gap between what’s theoretically possible and what’s been synthesised represents opportunities for new materials discovery.
Let’s explore this systematically using the Cu-Ti-O system!
# Import Libraries
import os
import itertools
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
# Materials science libraries
from pymatgen.core import Composition
from mp_api.client import MPRester
# SMACT for chemical screening
import smact
from smact import Element
from smact.screening import smact_validity
# Configure plotting
plt.style.use('default')
sns.set_palette("husl")
print("✓ Libraries imported successfully!")
print("Ready to begin compositional screening of Cu-Ti-O system")
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[1], line 13
11 # Materials science libraries
12 from pymatgen.core import Composition
---> 13 from mp_api.client import MPRester
15 # SMACT for chemical screening
16 import smact
ModuleNotFoundError: No module named 'mp_api'
Step 1: Define Our Chemical System#
# Our target system: Copper-Titanium-Oxygen
CHEMICAL_SYSTEM = ["Cu", "Ti", "O"]
MAX_STOICH = 8 # Maximum stoichiometry to consider
# Create SMACT Element objects
elements = [Element(symbol) for symbol in CHEMICAL_SYSTEM]
print(f"Target Chemical System: {'-'.join(CHEMICAL_SYSTEM)}")
print(f"Maximum stoichiometry: {MAX_STOICH}")
print(f"Elements loaded:")
for el in elements:
ox_states = ", ".join([f"{ox:+d}" for ox in el.oxidation_states[:5]]) # Show first 5
more = "..." if len(el.oxidation_states) > 5 else ""
print(f" {el.symbol}: oxidation states [{ox_states}{more}]")
print("\nChemical system defined successfully!")
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[2], line 6
3 MAX_STOICH = 8 # Maximum stoichiometry to consider
5 # Create SMACT Element objects
----> 6 elements = [Element(symbol) for symbol in CHEMICAL_SYSTEM]
8 print(f"Target Chemical System: {'-'.join(CHEMICAL_SYSTEM)}")
9 print(f"Maximum stoichiometry: {MAX_STOICH}")
Cell In[2], line 6, in <listcomp>(.0)
3 MAX_STOICH = 8 # Maximum stoichiometry to consider
5 # Create SMACT Element objects
----> 6 elements = [Element(symbol) for symbol in CHEMICAL_SYSTEM]
8 print(f"Target Chemical System: {'-'.join(CHEMICAL_SYSTEM)}")
9 print(f"Maximum stoichiometry: {MAX_STOICH}")
NameError: name 'Element' is not defined
Step 2: Generate All Possible Compositions#
print("Generating all possible compositions in the Cu-Ti-O system...")
# Generate compositions for all possible element combinations
all_compositions = []
# 1. Unary compounds (single elements)
for element in elements:
comp = Composition({element.symbol: 1})
all_compositions.append(comp.reduced_composition)
# 2. Binary compounds (two elements)
for el1, el2 in itertools.combinations(elements, 2):
for stoich1 in range(1, MAX_STOICH + 1):
for stoich2 in range(1, MAX_STOICH + 1):
comp = Composition({el1.symbol: stoich1, el2.symbol: stoich2})
all_compositions.append(comp.reduced_composition)
# 3. Ternary compounds (all three elements)
for stoich_cu in range(1, MAX_STOICH + 1):
for stoich_ti in range(1, MAX_STOICH + 1):
for stoich_o in range(1, MAX_STOICH + 1):
comp = Composition({"Cu": stoich_cu, "Ti": stoich_ti, "O": stoich_o})
all_compositions.append(comp.reduced_composition)
# Remove duplicates
unique_compositions = list(set(all_compositions))
print(f"Generated {len(unique_compositions)} unique compositions")
print("\nExamples of generated compositions:")
for i, comp in enumerate(unique_compositions[:8]):
print(f" {i+1}. {comp}")
print(f" ... and {len(unique_compositions)-8} more")
print(f"\nThis is the 'combinatorial explosion' we discussed earlier!")
Generating all possible compositions in the Cu-Ti-O system...
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[3], line 7
4 all_compositions = []
6 # 1. Unary compounds (single elements)
----> 7 for element in elements:
8 comp = Composition({element.symbol: 1})
9 all_compositions.append(comp.reduced_composition)
NameError: name 'elements' is not defined
Step 3: Apply Chemical Filters (SMACT Screening)#
print("Applying SMACT chemical validity filters...")
print("This checks for charge neutrality and electronegativity rules.\n")
# Apply SMACT validity test to each composition
valid_compositions = []
invalid_count = 0
for comp in unique_compositions:
if smact_validity(comp):
valid_compositions.append(comp)
else:
invalid_count += 1
# Calculate filtering statistics
total_generated = len(unique_compositions)
total_valid = len(valid_compositions)
filter_efficiency = (invalid_count / total_generated) * 100
print(f"Filtering Results:")
print(f" Total compositions generated: {total_generated:>6}")
print(f" Chemically valid compositions: {total_valid:>6}")
print(f" Invalid compositions removed: {invalid_count:>6}")
print(f" Filter efficiency: {filter_efficiency:>6.1f}%")
print(f"\nSMACT filters eliminated {filter_efficiency:.1f}% of impossible compositions!")
# Show examples of valid compositions
print(f"\nExamples of chemically valid compositions:")
for i, comp in enumerate(valid_compositions[:10]):
print(f" {i+1:2d}. {comp}")
if len(valid_compositions) > 10:
print(f" ... and {len(valid_compositions)-10} more")
Applying SMACT chemical validity filters...
This checks for charge neutrality and electronegativity rules.
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[4], line 8
5 valid_compositions = []
6 invalid_count = 0
----> 8 for comp in unique_compositions:
9 if smact_validity(comp):
10 valid_compositions.append(comp)
NameError: name 'unique_compositions' is not defined
Step 4: Compare with Experimental Reality#
print("Querying Materials Project for known compounds in Cu-Ti-O system...")
# Setup Materials Project API
MP_API_KEY = os.environ.get("MP_API_KEY", None)
if MP_API_KEY is None:
try:
with open("../assets/files/mp_api_key.txt", "r") as f:
MP_API_KEY = f.read().strip()
except FileNotFoundError:
print(" Materials Project API key not found.")
print("Please set MP_API_KEY environment variable or create mp_api_key.txt")
MP_API_KEY = None
if MP_API_KEY:
try:
# Query all chemical subsystems: Cu, Ti, O, Cu-Ti, Cu-O, Ti-O, Cu-Ti-O
chemical_systems = []
for r in range(1, len(CHEMICAL_SYSTEM) + 1):
for combo in itertools.combinations(CHEMICAL_SYSTEM, r):
chemical_systems.append("-".join(sorted(combo)))
print(f"Searching systems: {', '.join(chemical_systems)}")
with MPRester(MP_API_KEY) as mpr:
mp_entries = mpr.materials.summary.search(chemsys=chemical_systems)
# Extract compositions
mp_compositions = [entry.composition.reduced_composition for entry in mp_entries]
print(f"Found {len(mp_compositions)} known compounds in Materials Project")
# Show examples
print(f"\nExamples of known Materials Project compounds:")
for i, comp in enumerate(mp_compositions[:8]):
print(f" {i+1:2d}. {comp}")
if len(mp_compositions) > 8:
print(f" ... and {len(mp_compositions)-8} more")
except Exception as e:
print(f"Error querying Materials Project: {e}")
mp_compositions = []
else:
print("Skipping Materials Project query - no API key available")
mp_compositions = []
Querying Materials Project for known compounds in Cu-Ti-O system...
Searching systems: Cu, Ti, O, Cu-Ti, Cu-O, O-Ti, Cu-O-Ti
Error querying Materials Project: name 'MPRester' is not defined
Step 5: Analysis - Theory vs Reality#
print("Analysing the gap between theory and experiment...\n")
# Compare SMACT predictions with experimental reality
if mp_compositions:
# Convert to sets for comparison
smact_formulas = set(str(comp) for comp in valid_compositions)
mp_formulas = set(str(comp) for comp in mp_compositions)
# Find overlaps and gaps
both_predicted_and_known = smact_formulas.intersection(mp_formulas)
predicted_but_unknown = smact_formulas.difference(mp_formulas)
known_but_not_predicted = mp_formulas.difference(smact_formulas)
print(f"COMPOSITIONAL SCREENING ANALYSIS:")
print(f"{'='*50}")
print(f"SMACT predicted compositions: {len(smact_formulas):>6}")
print(f"Materials Project known compounds: {len(mp_formulas):>6}")
print(f"{'='*50}")
print(f"Correctly predicted (overlap): {len(both_predicted_and_known):>6}")
print(f"Predicted but not yet made: {len(predicted_but_unknown):>6}")
print(f"Known but not predicted by SMACT: {len(known_but_not_predicted):>6}")
print(f"{'='*50}")
# Calculate coverage metrics
if len(mp_formulas) > 0:
prediction_accuracy = len(both_predicted_and_known) / len(mp_formulas) * 100
print(f"SMACT prediction accuracy: {prediction_accuracy:>6.1f}%")
if len(smact_formulas) > 0:
experimental_coverage = len(both_predicted_and_known) / len(smact_formulas) * 100
print(f"Experimental coverage of theory: {experimental_coverage:>6.1f}%")
print(f"\nKey Insight: {len(predicted_but_unknown)} compositions are predicted")
print(f" to be chemically feasible but haven't been synthesised yet!")
print(f" These represent opportunities for materials discovery.")
# Show examples of unexplored compositions
if predicted_but_unknown:
unexplored_list = sorted(list(predicted_but_unknown))
print(f"\nExamples of unexplored compositions:")
for i, formula in enumerate(unexplored_list[:6]):
print(f" {i+1}. {formula}")
if len(unexplored_list) > 6:
print(f" ... and {len(unexplored_list)-6} more targets for synthesis")
else:
print("No Materials Project data available for comparison")
print("Analysis will focus on SMACT predictions only")
Analysing the gap between theory and experiment...
No Materials Project data available for comparison
Analysis will focus on SMACT predictions only
Step 6: Visualise Chemical Space with Ternary Plot#
print("Creating ternary plot to visualise Cu-Ti-O chemical space...")
def composition_to_fractions(comp, element_list):
"""Convert composition to fractional coordinates for ternary plotting."""
amounts = [comp[el.symbol] for el in element_list]
total = sum(amounts)
return [amt/total for amt in amounts] if total > 0 else [0, 0, 0]
# Convert compositions to ternary coordinates
smact_fractions = np.array([composition_to_fractions(c, elements) for c in valid_compositions])
cu_s, ti_s, o_s = smact_fractions[:,0], smact_fractions[:,1], smact_fractions[:,2]
# Convert MP compositions if available
if mp_compositions:
mp_fractions = np.array([composition_to_fractions(c, elements) for c in mp_compositions])
cu_m, ti_m, o_m = mp_fractions[:,0], mp_fractions[:,1], mp_fractions[:,2]
print(f"Plotting {len(smact_fractions)} SMACT predictions and {len(mp_fractions)} validated MP compounds")
else:
cu_m, ti_m, o_m = [], [], []
print(f"Plotting {len(smact_fractions)} SMACT predictions only")
# Create ternary plot
fig = go.Figure()
# Plot SMACT predictions
fig.add_trace(go.Scatterternary(
a=cu_s, b=ti_s, c=o_s,
mode="markers",
marker=dict(
size=5,
color="lightblue",
symbol="circle",
opacity=0.5,
line=dict(width=0.5, color="blue")
),
name=f"SMACT Predictions ({len(valid_compositions)})",
hovertemplate="<b>SMACT Valid Composition</b><br>" +
"Cu: %{a:.3f}<br>" +
"Ti: %{b:.3f}<br>" +
"O: %{c:.3f}<extra></extra>"
))
# Plot known compounds if available
if len(cu_m) > 0:
fig.add_trace(go.Scatterternary(
a=cu_m, b=ti_m, c=o_m,
mode="markers",
marker=dict(
size=12,
color="red",
symbol="star",
opacity=0.9,
line=dict(width=1.5, color="darkred")
),
name=f"Known Compounds ({len(mp_compositions)})",
hovertemplate="<b>Materials Project (Validated)</b><br>" +
"Cu: %{a:.3f}<br>" +
"Ti: %{b:.3f}<br>" +
"O: %{c:.3f}<extra></extra>"
))
# Enhanced styling
axis_style = dict(
title=dict(font=dict(size=16, color="black")),
linewidth=2,
linecolor="black",
gridcolor="rgba(128, 128, 128, 0.4)",
showticklabels=True,
tickvals=[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9],
tickfont=dict(size=12)
)
# Style the plot
fig.update_layout(
title=dict(
text="Cu-Ti-O Chemical Space: Theory vs Experiment",
font=dict(size=18, color="black"),
x=0.5
),
font=dict(size=12, family="Arial"),
width=700,
height=700,
ternary=dict(
bgcolor="rgba(250, 250, 250, 0.8)",
aaxis=dict(axis_style, title="Copper (Cu)"),
baxis=dict(axis_style, title="Titanium (Ti)"),
caxis=dict(axis_style, title="Oxygen (O)")
),
showlegend=True,
legend=dict(
x=0.02, y=0.98,
bgcolor="rgba(255,255,255,0.9)",
bordercolor="gray",
borderwidth=1
),
paper_bgcolor="white"
)
fig.show()
print("✓ Ternary plot created successfully!")
print("\n🔍 What the plot shows:")
print(" • Blue circles: Compositions predicted by SMACT as chemically feasible")
print(" • Red stars: Compounds that have actually been synthesised (Materials Project)")
print(" • Empty regions: Chemical space where no stable compounds exist")
print(" • Blue circles without red stars: Potential targets for new materials!")
Creating ternary plot to visualise Cu-Ti-O chemical space...
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
Cell In[7], line 11
9 # Convert compositions to ternary coordinates
10 smact_fractions = np.array([composition_to_fractions(c, elements) for c in valid_compositions])
---> 11 cu_s, ti_s, o_s = smact_fractions[:,0], smact_fractions[:,1], smact_fractions[:,2]
13 # Convert MP compositions if available
14 if mp_compositions:
IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed
Step 7: Identify Priority Targets for Synthesis#
print("Identifying the most promising compositions for experimental synthesis...\n")
if mp_compositions:
# Find compositions predicted by SMACT but not yet synthesised
smact_formulas = set(str(comp) for comp in valid_compositions)
mp_formulas = set(str(comp) for comp in mp_compositions)
unexplored_formulas = smact_formulas.difference(mp_formulas)
if unexplored_formulas:
# Convert back to Composition objects for analysis
unexplored_compositions = [Composition(formula) for formula in unexplored_formulas]
# Prioritise by complexity (simpler compositions often easier to synthesise)
def composition_complexity(comp):
"""Simple complexity metric: number of atoms + number of elements"""
return comp.num_atoms + len(comp.elements)
# Sort by complexity (simpler first)
prioritised_targets = sorted(unexplored_compositions, key=composition_complexity)
print(f"TOP SYNTHESIS TARGETS (ordered by complexity):")
print(f"{'Rank':<5} {'Formula':<15} {'Atoms':<8} {'Elements':<10} {'Complexity':<12}")
print("-" * 55)
for i, comp in enumerate(prioritised_targets[:10], 1):
complexity = composition_complexity(comp)
print(f"{i:<5} {str(comp):<15} {comp.num_atoms:<8} {len(comp.elements):<10} {complexity:<12}")
if len(prioritised_targets) > 10:
print(f" ... and {len(prioritised_targets)-10} more potential targets")
# Analyse by composition type
binary_targets = [c for c in prioritised_targets if len(c.elements) == 2]
ternary_targets = [c for c in prioritised_targets if len(c.elements) == 3]
print(f"\nTarget Summary:")
print(f" Binary compositions (2 elements): {len(binary_targets):>3}")
print(f" Ternary compositions (3 elements): {len(ternary_targets):>3}")
print(f" Total synthesis targets: {len(prioritised_targets):>3}")
# Show some specific examples
if binary_targets:
print(f"\nSimplest binary targets:")
for i, comp in enumerate(binary_targets[:3], 1):
print(f" {i}. {comp}")
if ternary_targets:
print(f"\nSimplest ternary targets:")
for i, comp in enumerate(ternary_targets[:3], 1):
print(f" {i}. {comp}")
print(f"\nRecommendation: Start experimental synthesis with the simplest")
print(f" compositions first, as they're often easier to make successfully.")
else:
print("All SMACT-predicted compositions have already been synthesised!")
print("This chemical system appears to be well-explored experimentally.")
else:
print("Cannot identify synthesis targets without Materials Project data.")
print("All SMACT-valid compositions could potentially be synthesis targets:")
# Show the simplest valid compositions as potential targets
def composition_complexity(comp):
return comp.num_atoms + len(comp.elements)
simple_targets = sorted(valid_compositions, key=composition_complexity)[:10]
print(f"\nSimplest SMACT-valid compositions to try:")
for i, comp in enumerate(simple_targets, 1):
complexity = composition_complexity(comp)
print(f" {i:2d}. {comp} (complexity: {complexity})")
Identifying the most promising compositions for experimental synthesis...
Cannot identify synthesis targets without Materials Project data.
All SMACT-valid compositions could potentially be synthesis targets:
Simplest SMACT-valid compositions to try:
Summary: What We’ve Learned About Compositional Screening#
print(f"YOUR ANALYSIS RESULTS:")
if 'unique_compositions' in locals():
print(f"✓ Generated {len(unique_compositions)} total compositions")
if 'valid_compositions' in locals():
print(f"✓ Found {len(valid_compositions)} chemically valid compositions")
if 'mp_compositions' in locals() and mp_compositions:
print(f"✓ Compared with {len(mp_compositions)} known compounds")
if 'predicted_but_unknown' in locals():
print(f"✓ Identified {len(predicted_but_unknown)} potential synthesis targets")
YOUR ANALYSIS RESULTS:
✓ Found 0 chemically valid compositions
💡 Key Concepts:#
✓ Generating comprehensive compositional spaces
✓ Applying chemical filters to eliminate impossible compositions
✓ Comparing theoretical predictions with experimental reality
✓ Visualising chemical space using ternary plots
✓ Identifying promising targets for materials synthesis
🐎 The Compositional screening workflow:#
Define chemical system → Cu-Ti-O
Generate all compositions → Combinatorial approach
Apply chemical filters → SMACT validity screening
Compare with databases → Materials Project
Visualise results → Ternary phase diagrams
Prioritise targets → Complexity-based ranking
🚀 Next Steps:#
• Apply this workflow to other chemical systems of interest
• Combine with additional filters (stability, synthesisability)
• Use machine learning to predict properties of promising compositions
• Design experiments to synthesise priority targets
Tip
Remember: The goal of compositional screening isn’t just to predict what’s possible, but to guide experimental efforts towards the most promising unexplored regions of chemical space. You now have the computational tools to do this!