Chemical Filters: From Theory to Practice with SMACT#
Welcome to the exploration of chemical filters! In the previous section, we saw how combinatorial explosion creates a large number of possible materials. Now we’ll dive deep into the chemistry that helps us filter out the impossible combinations.
What You’ll Learn#
In this notebook, we’ll explore:
How each chemical filter works - Understanding the science behind the screening
Implementing filters step by step - From simple to sophisticated
Comparing filter effectiveness - Which filters eliminate the most impossible combinations
Real-world applications - Targeted screening for specific material types
Advanced filtering strategies - Combining multiple criteria for maximum efficiency
Think of this as your workshop for building intelligent chemical screening systems!
Setup: Loading Our Toolkit#
Before we start filtering, let’s set up our environment with SMACT and related tools.
# Core imports for chemical filtering
import multiprocessing
from itertools import combinations, product
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from collections import defaultdict
import os
import csv
from itertools import combinations
from smact import element_dictionary
from smact import Element
from smact.screening import smact_filter
# SMACT - our main chemical filtering toolkit
import smact
from smact import Element, element_dictionary
from smact.screening import (
smact_filter, pauling_test, eneg_states_test,
smact_validity, neutral_ratios
)
# For chemical composition handling
from pymatgen.core import Composition
# For accessing real materials data (optional)
try:
from mp_api.client import MPRester
mp_available = True
except ImportError:
print("Materials Project API not available - install with: pip install mp-api")
mp_available = False
# Set plotting style
plt.style.use('default')
sns.set_palette("husl")
# Check SMACT version if available
try:
print(f"SMACT version: {smact.__version__}")
except AttributeError:
print("SMACT loaded successfully (version attribute not available)")
print("Chemical filtering toolkit ready!")
# Set up Materials Project API using environment variable
myapikey = os.getenv('MP_API_KEY')
if myapikey:
print("✓ Materials Project API key found")
else:
print("⚠ No MP_API_KEY environment variable found - Materials Project examples will be skipped")
myapikey = "replace_with_your_mp_api_key"
Materials Project API not available - install with: pip install mp-api
SMACT loaded successfully (version attribute not available)
Chemical filtering toolkit ready!
⚠ No MP_API_KEY environment variable found - Materials Project examples will be skipped
Pre-liminaries#
Understanding SMACT#
In this section of the tutorial you will begin to get a feel for how SMACT works. SMACT is a Python library designed to facilitate the exploration of chemical spaces for materials discovery. It provides tools to:
Generate possible compositions based on element combinations.
Apply chemical rules such as charge neutrality and electronegativity balance.
Filter out compositions that are unlikely to form stable compounds.
Key Features of SMACT#
Element Class: Represents elements with properties like oxidation states and electronegativities.
Screening Functions: Functions like smact_filter help apply chemical rules to filter compositions.
Integration with Other Libraries: SMACT works well with pymatgen and matminer for further analysis.
Part A: Generating Chemical Spaces#
There are two primary ways to generate chemical spaces:
Combinatorial Generation: As shown in the previous tutorial, this is a method by which we systematically combinine elements to create potential compositions.
Fetching Data from Materials Project: Using the Materials Project database to obtain existing materials.
Combinatorial Generation#
We can generate all possible combinations of elements within a set to explore potential compounds.
Example 1: Generate all ternary combinations from a list of elements.
# Define elements of interest
symbol_list = ['Li', 'Na', 'K', 'Mg', 'Ca', 'Sr', 'Ba', 'Al', 'Ga', 'In', 'Sn', 'Pb', 'Zn', 'Cd', 'Hg']
all_elements = element_dictionary(symbol_list)
# Generate all ternary combinations
ternary_combinations = combinations(all_elements.values(), 3)
# Print the first 5 combinations
for i, combo in enumerate(list(ternary_combinations)[:5]):
print(f"Combination {i+1}: {', '.join([el.symbol for el in combo])}")
Combination 1: Li, Na, K
Combination 2: Li, Na, Mg
Combination 3: Li, Na, Ca
Combination 4: Li, Na, Sr
Combination 5: Li, Na, Ba
Example 2: Fetching Data from Materials Project - The Materials Project is a database of computed materials properties. We can use their API to fetch materials data.
def get_binary_compounds(api_key: str, metallic_elements: list) -> pd.DataFrame:
"""
Query Materials Project for stable binary metallic compounds.
Args:
api_key: Materials Project API key
metallic_elements: List of metallic element symbols to search
Returns:
DataFrame containing compound properties
"""
if not mp_available:
print("Materials Project API not available")
return pd.DataFrame()
if not api_key or api_key == "replace_with_your_mp_api_key":
print("No valid Materials Project API key - skipping query")
return pd.DataFrame()
from mp_api.client import MPRester
compounds_info = []
excluded_elements = ["O", "S", "Se", "Te", "F", "Cl", "Br", "I", "N", "P", "As"]
fields = ["material_id", "formula_pretty", "elements", "energy_above_hull",
"symmetry", "band_gap", "theoretical"]
with MPRester(api_key) as mpr:
# Search each binary combination (limiting to first 5 for demo)
for pair in list(combinations(metallic_elements, 2))[:5]:
docs = mpr.materials.summary.search(
elements=list(pair),
num_elements=(2, 2),
energy_above_hull=(0, 0.1),
fields=fields
)
# Filter and store results
for doc in docs:
if not any(elem.symbol in excluded_elements for elem in doc.elements):
compounds_info.append({
"material_id": doc.material_id,
"formula": doc.formula_pretty,
"elements": ", ".join(elem.symbol for elem in doc.elements),
"energy_above_hull": doc.energy_above_hull,
"crystal_system": doc.symmetry.crystal_system,
"band_gap": doc.band_gap,
"theoretical": doc.theoretical
})
return pd.DataFrame(compounds_info)
# Define metallic elements to search (reduced list for demo)
metallic_elements = ["Li", "Be", "Na", "Mg", "Al", "K", "Ca", "Ti", "V", "Cr", "Mn",
"Fe", "Co", "Ni", "Cu", "Zn"]
# Query compounds and display results
if myapikey and myapikey != "replace_with_your_mp_api_key":
df = get_binary_compounds(myapikey, metallic_elements)
if not df.empty:
print("\nFirst 5 compounds found:")
print(df.head())
print(f"\nTotal compounds found: {len(df)}")
else:
print("No compounds retrieved")
else:
print("Skipping Materials Project query - no API key available")
print("To enable this section, set the MP_API_KEY environment variable")
Skipping Materials Project query - no API key available
To enable this section, set the MP_API_KEY environment variable
Part B: Applying Chemical Filters#
After generating a chemical space, we need to apply chemical filters to narrow down potential candidates.
Charge Neutrality and Electronegativity Tests#
SMACT Screening provides a variety of functions that come in handy when it comes to screening chemical spaces for desired workflows, it is equipped with:
Charge Neutrality: Ensuring the total charge in a compound is zero.
Pauling Test: Verifying that a combination of ions makes chemical sense,(i.e. positive ions should be of lower electronegativity).
Eneg States Test/Threshold/Alternate: checking electronegativity criterions between anions and cations.
no repeats: Check if any anion or cation appears twice.
ml_rep_generator: Function to take a composition of Elements and return a list of values between 0 and 1 that describes the composition, useful for machine learning.
smact_filter: combines both the charge neutrality and electronegativity tests in one go for simple application in external scripts that wish to apply the general ‘smact test’.
smact_validity: Check if a composition is valid according to the SMACT rules. Composition is considered valid if it passes the charge neutrality test and the Pauling electronegativity test.
Using smact_filter
#
The smact_filter
function combines both the charge neutrality and electronegativity
tests in one go for simple application in external scripts that
wish to apply the general ‘smact test’.
The function takes the following arguments:
def smact_filter(
els: Union[Tuple[Element], List[Element]],
threshold: Optional[int] = 8,
stoichs: Optional[List[List[int]]] = None,
species_unique: bool = True,
oxidation_states_set: str = "icsd24",
comp_tuple: bool = False,
) -> Union[List[Tuple[str, int, int]], List[Tuple[str, int]]]:
...
Parameters:
els
: A tuple or list of Element objects.
threshold
: Maximum allowed stoichiometry (default is 8).
stoichs
: Specific stoichiometric ratios to consider.
species_unique
: If True, considers different oxidation states as unique species.
oxidation_states_set
: Set of oxidation states to use (‘icsd24’, ‘smact14’,‘pymatgen’, ‘wiki’, or a custom file path). WARNING: For backwards compatibility in SMACT >=2.7, expllicitly set oxidation_states_set to ‘smact14’ if you wish to use the 2014 SMACT default oxidation states. In SMACT 3.0, the smact_filter function will be set to use a new default oxidation states set.
comp_tuple
: If True, returns results as named tuples.
Simple Example: Using smact_filter#
Let’s start with a simple example to see how smact_filter
works:
# Define elements
elements = (Element('Na'), Element('Cl'))
# Apply SMACT filter
compositions = smact_filter(elements)
# Display valid compositions
for comp in compositions:
print(comp)
(('Na', 'Cl'), (1, -1), (1, 1))
Part C: Identifying potential materials for specific engineering applications#
1: Binary Oxides for Photocatalysis#
First, let’s explore binary oxide semiconductors that might be suitable for water splitting:
def setup_binary_oxide_space():
"""Setup chemical space for binary oxides"""
# Define transition metals of interest
transition_metals = ["Ti", "V", "Cr", "Mn", "Fe", "Co", "Ni", "Cu", "Zn"]
# Convert to SMACT elements
tm_elements = [smact.Element(symbol) for symbol in transition_metals]
oxygen = smact.Element("O")
return tm_elements, oxygen
def binary_oxide_filter(metal):
"""Filter binary oxides based on chemical rules"""
compounds = []
# Oxidation states for oxygen
o_state = -2
for ox_state in metal.oxidation_states:
# Check charge neutrality
cn_e, cn_r = smact.neutral_ratios([ox_state, o_state], threshold=8)
if cn_e:
# Check electronegativity
eneg_ok = pauling_test(
[ox_state, o_state],
[metal.pauling_eneg, 3.44] # 3.44 is O electronegativity
)
if eneg_ok:
formula = [metal.symbol, "O"]
compounds.append((formula, cn_r[0]))
return compounds
# Generate candidates
metals, oxygen = setup_binary_oxide_space()
with multiprocessing.Pool() as pool:
binary_results = pool.map(binary_oxide_filter, metals)
# Format results
binary_formulas = []
for result in binary_results:
for comp in result:
formula = "".join(f"{el}{amt}" for el, amt in zip(comp[0], comp[1]))
binary_formulas.append(Composition(formula).reduced_formula)
print("Viable binary oxide candidates:")
print("\n".join(binary_formulas))
Viable binary oxide candidates:
TiO
Ti2O3
TiO2
V2O
VO
V2O3
VO2
V2O5
Cr2O
CrO
Cr2O3
CrO2
Cr2O5
CrO3
Mn2O
MnO
Mn2O3
MnO2
Mn2O5
MnO3
Mn2O7
Fe2O
FeO
Fe2O3
FeO2
Fe2O5
FeO3
Co2O
CoO
Co2O3
CoO2
Ni2O
NiO
Ni2O3
NiO2
Cu2O
CuO
Cu2O3
ZnO
Zn2O3
2: Ternary Chalcogenides for Solar Cells#
Now let’s explore ternary chalcogenides that might be suitable for solar cells:
def setup_chalcogenide_space():
"""Setup chemical space for ternary chalcogenides"""
# Group 11 metals (Cu, Ag)
group_11 = ["Cu", "Ag"]
# Group 13 metals (In, Ga)
group_13 = ["Ga", "In"]
# Chalcogens
chalcogens = ["S", "Se"]
metal_1 = [smact.Element(m) for m in group_11]
metal_2 = [smact.Element(m) for m in group_13]
chalc = [smact.Element(c) for c in chalcogens]
return metal_1, metal_2, chalc
def ternary_chalcogenide_filter(elements):
"""Filter ternary chalcogenides with specific criteria"""
compounds = []
m1, m2, ch = elements
# Additional criteria for solar cells
bandgap_range = (1.0, 2.5) # eV
for ox_1 in m1.oxidation_states:
for ox_2 in m2.oxidation_states:
for ox_ch in ch.oxidation_states:
ox_states = [ox_1, ox_2, ox_ch]
# Charge neutrality check
cn_e, cn_r = smact.neutral_ratios(ox_states, threshold=8)
if cn_e:
# Electronegativity check
eneg_ok = pauling_test(
ox_states,
[m1.pauling_eneg, m2.pauling_eneg, ch.pauling_eneg]
)
if eneg_ok:
formula = [m1.symbol, m2.symbol, ch.symbol]
compounds.append((formula, cn_r[0]))
return compounds
# Generate candidates
m1_els, m2_els, ch_els = setup_chalcogenide_space()
ternary_combinations = [(m1, m2, ch)
for m1 in m1_els
for m2 in m2_els
for ch in ch_els]
with multiprocessing.Pool() as pool:
ternary_results = pool.map(ternary_chalcogenide_filter, ternary_combinations)
3: Double Perovskites for Ferroelectrics#
Finally, let’s explore double perovskites (A2BB’O6):
def setup_double_perovskite_space():
"""Setup chemical space for double perovskites"""
# A-site cations (large ionic radius)
a_site = ["Ba", "Sr", "Ca"]
# B-site cations (smaller transition metals)
b_site = ["Fe", "Mn", "Ni"]
b_prime_site = ["Mo", "W", "Re"]
a_els = [smact.Element(a) for a in a_site]
b_els = [smact.Element(b) for b in b_site]
b_prime_els = [smact.Element(bp) for bp in b_prime_site]
oxygen = smact.Element("O")
return a_els, b_els, b_prime_els, oxygen
def double_perovskite_filter(elements):
"""Filter double perovskites with specific criteria"""
compounds = []
a, b, b_prime, o = elements
# Goldschmidt tolerance factor limits
tol_factor_range = (0.8, 1.0)
for a_ox in a.oxidation_states:
for b_ox in b.oxidation_states:
for bp_ox in b_prime.oxidation_states:
ox_states = [a_ox, b_ox, bp_ox, -2] # O is -2
# Check charge balance
if sum([2*a_ox, b_ox, bp_ox, 6*(-2)]) == 0:
# Check electronegativity ordering
eneg_ok = pauling_test(
ox_states,
[a.pauling_eneg, b.pauling_eneg,
b_prime.pauling_eneg, 3.44]
)
if eneg_ok:
formula = [a.symbol, b.symbol, b_prime.symbol, "O"]
compounds.append((formula, [2, 1, 1, 6]))
return compounds
# Generate candidates
a_els, b_els, bp_els, oxygen = setup_double_perovskite_space()
perovskite_combinations = [(a, b, bp, oxygen)
for a in a_els
for b in b_els
for bp in bp_els]
with multiprocessing.Pool() as pool:
perovskite_results = pool.map(double_perovskite_filter,
perovskite_combinations)
# Flatten results and create dataframe
flattened_results = [comp for result in perovskite_results if result for comp in result]
df = pd.DataFrame(flattened_results, columns=['Formula', 'Stoichiometry'])
# Print first 5 candidates
print("\nFirst 5 double perovskite candidates:")
print(df.head())
First 5 double perovskite candidates:
Formula Stoichiometry
0 [Ba, Fe, Mo, O] [2, 1, 1, 6]
1 [Ba, Fe, Mo, O] [2, 1, 1, 6]
2 [Ba, Fe, Mo, O] [2, 1, 1, 6]
3 [Ba, Fe, Mo, O] [2, 1, 1, 6]
4 [Ba, Fe, Mo, O] [2, 1, 1, 6]
4: Identifying Potential Battery Materials#
Goal: Find binary compounds suitable for battery applications.
This is a simple example workflow. For a more comprehensive guide, see the Materials Project tutorial or the Materials Project Batteries Explorer documentation.
Workflow Steps:
Fetch Data: Use the Materials Project API to retrieve binary compounds.
Filter Compounds: Select compounds with low energy above hull (i.e., stable) and desired properties.
Apply SMACT Validity Check: Ensure the compounds pass SMACT’s chemical rules.
if myapikey and myapikey != "replace_with_your_mp_api_key" and mp_available:
from mp_api.client import MPRester
from smact.screening import smact_validity
from pymatgen.core import Composition
with MPRester(myapikey) as mpr:
# Search for binary compounds (limited for demo)
docs = mpr.materials.summary.search(
num_elements=2, # Fixed: use single integer instead of tuple
energy_above_hull=(0, 0.05),
is_metal=False,
fields=["material_id", "formula_pretty", "band_gap",
"energy_above_hull", "formation_energy_per_atom"]
)
# Filter and validate compounds
valid_compounds = []
for doc in list(docs)[:100]: # Limit to first 100 for demo
formula = doc.formula_pretty
if smact_validity(formula):
valid_compounds.append({
'material_id': doc.material_id,
'formula': formula,
'band_gap': doc.band_gap,
'energy_above_hull': doc.energy_above_hull,
'formation_energy_per_atom': doc.formation_energy_per_atom,
})
print(f"Number of valid battery material candidates: {len(valid_compounds)}")
# Save to CSV
df = pd.DataFrame(valid_compounds)
df.to_csv('battery_material_candidates.csv', index=False)
print("Results saved to 'battery_material_candidates.csv'")
else:
print("Skipping battery materials query - Materials Project API not available")
print("To enable this section:")
print("1. Set MP_API_KEY environment variable")
print("2. Install mp-api: pip install mp-api")
Skipping battery materials query - Materials Project API not available
To enable this section:
1. Set MP_API_KEY environment variable
2. Install mp-api: pip install mp-api
Part D: Advanced Oxidation States Filtering#
SMACT now includes advanced filtering capabilities based on ICSD oxidation states data. This allows you to tune filtering workflows using consensus and commonality thresholds to improve the quality of your chemical filtering.
Understanding ICSD24 Oxidation States Filter#
The ICSD24OxStatesFilter
allows you to create custom oxidation state files based on:
Consensus: How many different sources agree on an oxidation state
Commonality: How frequently an oxidation state appears in the ICSD database
Include zero: Whether to include zero oxidation states (metals)
This gives you much finer control over chemical filtering than using default oxidation state sets.
# Import the advanced oxidation states filter
import numpy as np
try:
import plotly.graph_objects as go
plotly_available = True
except ImportError:
plotly_available = False
print("Plotly not available - install with: pip install plotly")
from smact import Element
from smact.screening import smact_filter
from pymatgen.core import Composition
from itertools import combinations
from smact.utils.oxidation import ICSD24OxStatesFilter
# First generate our custom oxidation states file
ox_filter = ICSD24OxStatesFilter()
commonality = 2
custom_filename = f"oxidation_states_icsd24_commonality_{commonality}.txt"
# Write out the filtered oxidation states
ox_filter.write(
custom_filename,
consensus=1,
include_zero=False,
commonality=commonality,
comment=f"Oxidation states with commonality ≥ {commonality}"
)
print(f"Wrote filtered oxidation states to '{custom_filename}'")
print(f"\nFilter settings:")
print(f"- Consensus threshold: 1 (at least 1 source agrees)")
print(f"- Commonality threshold: {commonality} (appears at least {commonality} times in ICSD)")
print(f"- Include zero oxidation states: False")
Wrote filtered oxidation states to 'oxidation_states_icsd24_commonality_2.txt'
Filter settings:
- Consensus threshold: 1 (at least 1 source agrees)
- Commonality threshold: 2 (appears at least 2 times in ICSD)
- Include zero oxidation states: False
Comparing Filtering with Different Oxidation State Sets#
Let’s demonstrate the power of this approach by comparing filtering results using different oxidation state sets. We’ll use Bi-Te-In system as an example (relevant for thermoelectric materials):
# Elements of interest for thermoelectric materials
element_symbols = ["Bi", "Te", "In"]
elements = [Element(sym) for sym in element_symbols]
def generate_valid_compositions(el_list, oxidation_states_set="icsd24", maxstoichiometrythreshold=8):
"""Return unique reduced Composition objects from smact_filter."""
combos = smact_filter(
el_list,
threshold=maxstoichiometrythreshold,
oxidation_states_set=oxidation_states_set,
species_unique=True
)
comps = set()
for symbols, ox_states, ratios in combos:
comp = Composition({sym: amt for sym, amt in zip(symbols, ratios)}).reduced_composition
comps.add(comp)
return comps
def generate_filtered_compositions(path=None, maxstoichiometrythreshold=8):
"""
Generate ternary and binary compositions using a custom or default oxidation state set.
Returns (all_comps, ternary_comps, binary_comps).
"""
ox_set = path or "icsd24"
# ternary
ternary = generate_valid_compositions(elements, ox_set, maxstoichiometrythreshold)
# binaries
binary = set()
for pair in combinations(elements, 2):
binary |= generate_valid_compositions(list(pair), ox_set, maxstoichiometrythreshold)
all_comps = ternary | binary
return all_comps, ternary, binary
# Generate filtered compositions using our custom file
print("Generating filtered compositions (ternary + binaries)...")
try:
all_comps, ternary_comps, binary_comps = generate_filtered_compositions(custom_filename)
except Exception as e:
print(f"Warning: {e}\nFalling back to default ICSD24 oxidation states.")
all_comps, ternary_comps, binary_comps = generate_filtered_compositions()
# Analysis
count_ternary = len(ternary_comps)
count_binary = len(binary_comps)
count_total = len(all_comps)
print(f"\nFiltered ICSD24 results:")
print(f" Ternary compositions: {count_ternary}")
print(f" Binary compositions: {count_binary}")
print(f" Total compositions: {count_total}")
Generating filtered compositions (ternary + binaries)...
Filtered ICSD24 results:
Ternary compositions: 81
Binary compositions: 12
Total compositions: 93
Comparing with Different Commonality Thresholds#
Let’s see how changing the commonality threshold affects our results:
# Compare different commonality thresholds
commonality_thresholds = [1, 2, 5, 10]
results_comparison = {}
for threshold in commonality_thresholds:
# Create custom oxidation states file
custom_file = f"oxidation_states_icsd24_commonality_{threshold}.txt"
ox_filter.write(
custom_file,
consensus=1,
include_zero=False,
commonality=threshold,
comment=f"Oxidation states with commonality ≥ {threshold}"
)
# Generate compositions with this threshold
try:
all_comps, ternary_comps, binary_comps = generate_filtered_compositions(custom_file)
results_comparison[threshold] = {
'total': len(all_comps),
'ternary': len(ternary_comps),
'binary': len(binary_comps)
}
except Exception as e:
print(f"Error with threshold {threshold}: {e}")
results_comparison[threshold] = {'total': 0, 'ternary': 0, 'binary': 0}
# Display comparison
print("\nComparison of different commonality thresholds:")
print(f"{'Threshold':>10} | {'Total':>8} | {'Ternary':>8} | {'Binary':>8}")
print("-" * 50)
for threshold, counts in results_comparison.items():
print(f"{threshold:>10} | {counts['total']:>8} | {counts['ternary']:>8} | {counts['binary']:>8}")
print("\n Higher commonality thresholds = more restrictive filtering = fewer compositions")
Comparison of different commonality thresholds:
Threshold | Total | Ternary | Binary
--------------------------------------------------
1 | 93 | 81 | 12
2 | 93 | 81 | 12
5 | 93 | 81 | 12
10 | 93 | 81 | 12
Higher commonality thresholds = more restrictive filtering = fewer compositions
Visualising the Results#
Let’s create a ternary plot to visualise our filtered compositions in chemical space:
# Create ternary plot for our compositions
try:
import plotly.graph_objects as go
# Use compositions from commonality threshold = 2
all_comps, ternary_comps, binary_comps = generate_filtered_compositions(
f"oxidation_states_icsd24_commonality_2.txt"
)
# Extract element fractions for ternary plot
e1 = np.array([c[element_symbols[0]] for c in all_comps])
e2 = np.array([c[element_symbols[1]] for c in all_comps])
e3 = np.array([c[element_symbols[2]] for c in all_comps])
total = e1 + e2 + e3
# Create ternary scatter plot
trace = go.Scatterternary(
a=e1/total,
b=e2/total,
c=e3/total,
mode="markers",
marker=dict(
size=8,
color="green",
symbol="circle",
opacity=0.7,
),
name="SMACT Valid",
cliponaxis=False,
)
axis_style = dict(
title=dict(font=dict(size=12)),
linewidth=1,
linecolor="black",
gridcolor="rgba(128, 128, 128, 0.2)",
showticklabels=True,
tickvals=[0.2, 0.4, 0.6, 0.8],
)
fig = go.Figure(trace)
fig.update_layout(
font=dict(size=12, family="Arial"),
width=500,
height=500,
ternary=dict(
bgcolor="rgba(0, 0, 0, 0)",
aaxis=dict(axis_style, title=element_symbols[0]),
baxis=dict(axis_style, title=element_symbols[1]),
caxis=dict(axis_style, title=element_symbols[2]),
),
margin=dict(l=40, r=40, b=40, t=40),
title=f"SMACT-Valid Compositions in {'-'.join(element_symbols)} System",
showlegend=False,
)
# Show the plot
fig.show()
except ImportError:
print("Plotly not available for ternary plotting. Install with: pip install plotly")
print(f"Found {len(all_comps)} total compositions in the {'-'.join(element_symbols)} system")
# Show some example compositions instead
print("\nSample compositions:")
for i, comp in enumerate(list(all_comps)[:10]):
print(f" {i+1:2d}. {comp}")
if len(all_comps) > 10:
print(f" ... and {len(all_comps) - 10} more!")
Understanding the Impact of Different Oxidation State Sets#
Let’s compare the traditional SMACT approach with the new ICSD24 approach:
# Compare different oxidation state sets
ox_state_sets = {
"SMACT 2014": "smact14",
"ICSD24 Default": "icsd24",
"ICSD24 Filtered (commonality≥5)": "oxidation_states_icsd24_commonality_5.txt"
}
print("Comparison of different oxidation state sets:")
print(f"{'Oxidation Set':>30} | {'Total':>8} | {'Ternary':>8} | {'Binary':>8}")
print("-" * 65)
for name, ox_set in ox_state_sets.items():
try:
all_comps, ternary_comps, binary_comps = generate_filtered_compositions(
ox_set if ox_set.endswith('.txt') else None,
maxstoichiometrythreshold=8
)
if ox_set in ["smact14", "icsd24"]:
# For built-in sets, need to specify the set name
all_comps, ternary_comps, binary_comps = generate_filtered_compositions()
# Re-generate with correct oxidation state set
ternary = generate_valid_compositions(elements, ox_set, 8)
binary = set()
for pair in combinations(elements, 2):
binary |= generate_valid_compositions(list(pair), ox_set, 8)
all_comps = ternary | binary
ternary_comps = ternary
binary_comps = binary
print(f"{name:>30} | {len(all_comps):>8} | {len(ternary_comps):>8} | {len(binary_comps):>8}")
except Exception as e:
print(f"{name:>30} | {'Error':>8} | {'Error':>8} | {'Error':>8}")
print(f" Error: {e}")
print("\n**Key Insights:**")
print("• ICSD24 sets are based on experimental crystal structure data")
print("• Higher commonality thresholds = more conservative filtering")
print("• Custom filtering allows you to balance coverage vs. reliability")
print("• Different sets may be optimal for different material types")
Comparison of different oxidation state sets:
Oxidation Set | Total | Ternary | Binary
-----------------------------------------------------------------
SMACT 2014 | 102 | 92 | 10
ICSD24 Default | 327 | 301 | 26
ICSD24 Filtered (commonality≥5) | 93 | 81 | 12
**Key Insights:**
• ICSD24 sets are based on experimental crystal structure data
• Higher commonality thresholds = more conservative filtering
• Custom filtering allows you to balance coverage vs. reliability
• Different sets may be optimal for different material types
Part E: Advanced Methods#
Parallel Processing for Large Datasets#
When dealing with large chemical spaces, computations can be time-consuming. Using multiprocessing can speed up the process.
import multiprocessing
def process_combinations(els):
# Your filtering code here
pass
# with multiprocessing.Pool() as pool:
# results = pool.map(process_combinations, element_combinations)
Parallel processing is particularly valuable when featurising large datasets, but needs to be handled carefully.
For parallel featurisation using matminer, you can control the number of parallel processes:
from matminer.featurizers import feature_calculators
feature_calculators.set_n_jobs(n_jobs=X) # X is number of parallel processes
While setting n_jobs=-1 uses all available cores, this can cause memory issues with large datasets. A safer approach is using 1-2 cores ie setting n_jobs to 1 or 2 or chunking the data:
import pandas as pd
from matminer.featurisers import composition as cf
Example chunking approach#
def process_chunk(chunk_df):
featuriser = cf.ElementProperty.from_preset("magpie")
return featuriser.featurise_dataframe(chunk_df, "formula")
# Split dataframe into chunks
chunk_size = 1000 # Adjust based on your memory constraints
chunks = [df[i:i + chunk_size] for i in range(0, len(df), chunk_size)]
# Process chunks sequentially
results = []
for chunk in chunks:
processed_chunk = process_chunk(chunk)
results.append(processed_chunk)
# Combine results
final_df = pd.concat(results, ignore_index=True)
Note: Always test your featurisation pipeline on a small subset first before processing the full dataset.
Exercise: Your Turn to Filter!#
Try these advanced filtering challenges:
Explore a different chemical system: Choose 3 elements relevant to your research
Test consensus thresholds: Compare consensus=1 vs consensus=2 vs consensus=3
Include metals: Set
include_zero=True
and see how it affects resultsCombine with property filters: Add criteria like electronegativity differences
Use the cells below for your explorations:”
# Your exploration space - try different element combinations!
# Example: Solar cell materials (Cu-In-Ga-Se system)
# your_elements = ["Cu", "In", "Ga", "Se"]
# Example: Battery materials (Li-Co-O system)
# your_elements = ["Li", "Co", "O"]
# Define your elements and apply filters
element_symbols = ["Ti", "Zn", "O"] # Change these to elements of your interest
elements = [Element(sym) for sym in element_symbols]
# Apply the filters we learned
from smact.screening import smact_filter
# Get all possible combinations using correct smact_filter syntax
filtered_compositions = smact_filter(elements, threshold=8)
print(f"Found {len(filtered_compositions)} viable compositions using {element_symbols}")
print("First 10 compositions:")
for i, comp in enumerate(filtered_compositions[:10]):
symbols, ox_states, ratios = comp
formula = ""
for j, (sym, ratio) in enumerate(zip(symbols, ratios)):
if ratio > 1:
formula += f"{sym}{ratio}"
else:
formula += sym
print(f" {i+1:2d}. {formula} (oxidation states: {ox_states})")
if len(filtered_compositions) > 10:
print(f" ... and {len(filtered_compositions) - 10} more compositions!")
print(f"\nTry changing element_symbols to explore other chemical systems!")
print("Examples:")
print("- Solar cells: ['Cu', 'In', 'Se']")
print("- Batteries: ['Li', 'Co', 'O']")
print("- Thermoelectrics: ['Bi', 'Te', 'Se']")
Found 64 viable compositions using ['Ti', 'Zn', 'O']
First 10 compositions:
1. TiZnO2 (oxidation states: (2, 2, -2))
2. TiZn2O3 (oxidation states: (2, 2, -2))
3. TiZn3O4 (oxidation states: (2, 2, -2))
4. TiZn4O5 (oxidation states: (2, 2, -2))
5. TiZn5O6 (oxidation states: (2, 2, -2))
6. TiZn6O7 (oxidation states: (2, 2, -2))
7. TiZn7O8 (oxidation states: (2, 2, -2))
8. Ti2ZnO3 (oxidation states: (2, 2, -2))
9. Ti2Zn3O5 (oxidation states: (2, 2, -2))
10. Ti2Zn5O7 (oxidation states: (2, 2, -2))
... and 54 more compositions!
Try changing element_symbols to explore other chemical systems!
Examples:
- Solar cells: ['Cu', 'In', 'Se']
- Batteries: ['Li', 'Co', 'O']
- Thermoelectrics: ['Bi', 'Te', 'Se']
🥳 Conclusion#
In this tutorial, we’ve explored how to use SMACT and related tools to:
Generate chemical spaces either combinatorially or by fetching data from databases
Apply basic chemical filters using charge neutrality and electronegativity rules
Use advanced oxidation state filtering with consensus and commonality thresholds
Identify materials suitable for specific engineering applications
Compare different filtering approaches to optimise your screening workflows
💡 Key Takeaways#
Chemical filtering dramatically reduces search spaces from millions to hundreds of candidates
ICSD24 oxidation states provide experimentally-grounded filtering based on real crystal structures
Consensus and commonality thresholds allow fine-tuning of filter strictness
Different applications benefit from different filtering strategies - no one-size-fits-all approach
Combining multiple filters (chemical + property-based) gives the most targeted results
By leveraging SMACT’s capabilities, you can efficiently navigate the vast landscape of possible compounds and focus on the most promising candidates for experimental validation.