Chemical Filters: From Theory to Practice with SMACT

Chemical Filters: From Theory to Practice with SMACT#

Welcome to the exploration of chemical filters! In the previous section, we saw how combinatorial explosion creates a large number of possible materials. Now we’ll dive deep into the chemistry that helps us filter out the impossible combinations.

What You’ll Learn#

In this notebook, we’ll explore:

How each chemical filter works - Understanding the science behind the screening
Implementing filters step by step - From simple to sophisticated
Comparing filter effectiveness - Which filters eliminate the most impossible combinations
Real-world applications - Targeted screening for specific material types
Advanced filtering strategies - Combining multiple criteria for maximum efficiency

Think of this as your workshop for building intelligent chemical screening systems!

Setup: Loading Our Toolkit#

Before we start filtering, let’s set up our environment with SMACT and related tools.

# Core imports for chemical filtering
import multiprocessing
from itertools import combinations, product
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from collections import defaultdict
import os
import csv
from itertools import combinations
from smact import element_dictionary
from smact import Element
from smact.screening import smact_filter

# SMACT - our main chemical filtering toolkit
import smact
from smact import Element, element_dictionary
from smact.screening import (
    smact_filter, pauling_test, eneg_states_test, 
    smact_validity, neutral_ratios
)

# For chemical composition handling
from pymatgen.core import Composition

# For accessing real materials data (optional)
try:
    from mp_api.client import MPRester
    mp_available = True
except ImportError:
    print("Materials Project API not available - install with: pip install mp-api")
    mp_available = False

# Set plotting style
plt.style.use('default')
sns.set_palette("husl")

# Check SMACT version if available
try:
    print(f"SMACT version: {smact.__version__}")
except AttributeError:
    print("SMACT loaded successfully (version attribute not available)")

print("Chemical filtering toolkit ready!")

# Set up Materials Project API using environment variable
myapikey = os.getenv('MP_API_KEY')
if myapikey:
    print("✓ Materials Project API key found")
else:
    print("⚠ No MP_API_KEY environment variable found - Materials Project examples will be skipped")
    myapikey = "replace_with_your_mp_api_key"

Materials Project API not available - install with: pip install mp-api
SMACT loaded successfully (version attribute not available)
Chemical filtering toolkit ready!
⚠ No MP_API_KEY environment variable found - Materials Project examples will be skipped

Pre-liminaries#

Understanding SMACT#

In this section of the tutorial you will begin to get a feel for how SMACT works. SMACT is a Python library designed to facilitate the exploration of chemical spaces for materials discovery. It provides tools to:

Generate possible compositions based on element combinations.
Apply chemical rules such as charge neutrality and electronegativity balance.
Filter out compositions that are unlikely to form stable compounds.

Key Features of SMACT#

Element Class: Represents elements with properties like oxidation states and electronegativities.
Screening Functions: Functions like smact_filter help apply chemical rules to filter compositions.
Integration with Other Libraries: SMACT works well with pymatgen and matminer for further analysis.

Part A: Generating Chemical Spaces#

There are two primary ways to generate chemical spaces:

Combinatorial Generation: As shown in the previous tutorial, this is a method by which we systematically combinine elements to create potential compositions.
Fetching Data from Materials Project: Using the Materials Project database to obtain existing materials.

Combinatorial Generation#

We can generate all possible combinations of elements within a set to explore potential compounds.

Example 1: Generate all ternary combinations from a list of elements.

# Define elements of interest
symbol_list = ['Li', 'Na', 'K', 'Mg', 'Ca', 'Sr', 'Ba', 'Al', 'Ga', 'In', 'Sn', 'Pb', 'Zn', 'Cd', 'Hg']
all_elements = element_dictionary(symbol_list)

# Generate all ternary combinations
ternary_combinations = combinations(all_elements.values(), 3)

# Print the first 5 combinations
for i, combo in enumerate(list(ternary_combinations)[:5]):
    print(f"Combination {i+1}: {', '.join([el.symbol for el in combo])}")

Combination 1: Li, Na, K
Combination 2: Li, Na, Mg
Combination 3: Li, Na, Ca
Combination 4: Li, Na, Sr
Combination 5: Li, Na, Ba

Example 2: Fetching Data from Materials Project - The Materials Project is a database of computed materials properties. We can use their API to fetch materials data.

def get_binary_compounds(api_key: str, metallic_elements: list) -> pd.DataFrame:
    """
    Query Materials Project for stable binary metallic compounds.
    
    Args:
        api_key: Materials Project API key
        metallic_elements: List of metallic element symbols to search
        
    Returns:
        DataFrame containing compound properties
    """
    if not mp_available:
        print("Materials Project API not available")
        return pd.DataFrame()
        
    if not api_key or api_key == "replace_with_your_mp_api_key":
        print("No valid Materials Project API key - skipping query")
        return pd.DataFrame()
        
    from mp_api.client import MPRester
    
    compounds_info = []
    excluded_elements = ["O", "S", "Se", "Te", "F", "Cl", "Br", "I", "N", "P", "As"]
    fields = ["material_id", "formula_pretty", "elements", "energy_above_hull", 
             "symmetry", "band_gap", "theoretical"]
    
    with MPRester(api_key) as mpr:
        # Search each binary combination (limiting to first 5 for demo)
        for pair in list(combinations(metallic_elements, 2))[:5]:
            docs = mpr.materials.summary.search(
                elements=list(pair),
                num_elements=(2, 2), 
                energy_above_hull=(0, 0.1),
                fields=fields
            )
            
            # Filter and store results
            for doc in docs:
                if not any(elem.symbol in excluded_elements for elem in doc.elements):
                    compounds_info.append({
                        "material_id": doc.material_id,
                        "formula": doc.formula_pretty,
                        "elements": ", ".join(elem.symbol for elem in doc.elements),
                        "energy_above_hull": doc.energy_above_hull,
                        "crystal_system": doc.symmetry.crystal_system,
                        "band_gap": doc.band_gap,
                        "theoretical": doc.theoretical
                    })
                    
    return pd.DataFrame(compounds_info)

# Define metallic elements to search (reduced list for demo)
metallic_elements = ["Li", "Be", "Na", "Mg", "Al", "K", "Ca", "Ti", "V", "Cr", "Mn", 
                    "Fe", "Co", "Ni", "Cu", "Zn"]

# Query compounds and display results
if myapikey and myapikey != "replace_with_your_mp_api_key":
    df = get_binary_compounds(myapikey, metallic_elements)
    if not df.empty:
        print("\nFirst 5 compounds found:")
        print(df.head())
        print(f"\nTotal compounds found: {len(df)}")
    else:
        print("No compounds retrieved")
else:
    print("Skipping Materials Project query - no API key available")
    print("To enable this section, set the MP_API_KEY environment variable")

Skipping Materials Project query - no API key available
To enable this section, set the MP_API_KEY environment variable

Part B: Applying Chemical Filters#

After generating a chemical space, we need to apply chemical filters to narrow down potential candidates.

Charge Neutrality and Electronegativity Tests#

SMACT Screening provides a variety of functions that come in handy when it comes to screening chemical spaces for desired workflows, it is equipped with:

Charge Neutrality: Ensuring the total charge in a compound is zero.
Pauling Test: Verifying that a combination of ions makes chemical sense,(i.e. positive ions should be of lower electronegativity).
Eneg States Test/Threshold/Alternate: checking electronegativity criterions between anions and cations.
no repeats: Check if any anion or cation appears twice.
ml_rep_generator: Function to take a composition of Elements and return a list of values between 0 and 1 that describes the composition, useful for machine learning.
smact_filter: combines both the charge neutrality and electronegativity tests in one go for simple application in external scripts that wish to apply the general ‘smact test’.
smact_validity: Check if a composition is valid according to the SMACT rules. Composition is considered valid if it passes the charge neutrality test and the Pauling electronegativity test.

Using `smact_filter`#

The smact_filter function combines both the charge neutrality and electronegativity tests in one go for simple application in external scripts that wish to apply the general ‘smact test’.

The function takes the following arguments:

def smact_filter(
    els: Union[Tuple[Element], List[Element]],
    threshold: Optional[int] = 8,
    stoichs: Optional[List[List[int]]] = None,
    species_unique: bool = True,
    oxidation_states_set: str = "icsd24",
    comp_tuple: bool = False,
) -> Union[List[Tuple[str, int, int]], List[Tuple[str, int]]]:
    ...

Parameters:

els: A tuple or list of Element objects.

threshold: Maximum allowed stoichiometry (default is 8).

stoichs: Specific stoichiometric ratios to consider.

species_unique: If True, considers different oxidation states as unique species.

oxidation_states_set: Set of oxidation states to use (‘icsd24’, ‘smact14’,‘pymatgen’, ‘wiki’, or a custom file path). WARNING: For backwards compatibility in SMACT >=2.7, expllicitly set oxidation_states_set to ‘smact14’ if you wish to use the 2014 SMACT default oxidation states. In SMACT 3.0, the smact_filter function will be set to use a new default oxidation states set.

comp_tuple: If True, returns results as named tuples.

Simple Example: Using smact_filter#

Let’s start with a simple example to see how smact_filter works:

# Define elements
elements = (Element('Na'), Element('Cl'))

# Apply SMACT filter
compositions = smact_filter(elements)

# Display valid compositions
for comp in compositions:
    print(comp)

(('Na', 'Cl'), (1, -1), (1, 1))

Part C: Identifying potential materials for specific engineering applications#

1: Binary Oxides for Photocatalysis#

First, let’s explore binary oxide semiconductors that might be suitable for water splitting:

def setup_binary_oxide_space():
    """Setup chemical space for binary oxides"""
    # Define transition metals of interest
    transition_metals = ["Ti", "V", "Cr", "Mn", "Fe", "Co", "Ni", "Cu", "Zn"]
    
    # Convert to SMACT elements
    tm_elements = [smact.Element(symbol) for symbol in transition_metals]
    oxygen = smact.Element("O")
    
    return tm_elements, oxygen

def binary_oxide_filter(metal):
    """Filter binary oxides based on chemical rules"""
    compounds = []
    
    # Oxidation states for oxygen
    o_state = -2
    
    for ox_state in metal.oxidation_states:
        # Check charge neutrality
        cn_e, cn_r = smact.neutral_ratios([ox_state, o_state], threshold=8)
        
        if cn_e:
            # Check electronegativity
            eneg_ok = pauling_test(
                [ox_state, o_state],
                [metal.pauling_eneg, 3.44]  # 3.44 is O electronegativity
            )
            
            if eneg_ok:
                formula = [metal.symbol, "O"]
                compounds.append((formula, cn_r[0]))
    
    return compounds

# Generate candidates
metals, oxygen = setup_binary_oxide_space()
with multiprocessing.Pool() as pool:
    binary_results = pool.map(binary_oxide_filter, metals)

# Format results
binary_formulas = []
for result in binary_results:
    for comp in result:
        formula = "".join(f"{el}{amt}" for el, amt in zip(comp[0], comp[1]))
        binary_formulas.append(Composition(formula).reduced_formula)

print("Viable binary oxide candidates:")
print("\n".join(binary_formulas))

Viable binary oxide candidates:
TiO
Ti2O3
TiO2
V2O
VO
V2O3
VO2
V2O5
Cr2O
CrO
Cr2O3
CrO2
Cr2O5
CrO3
Mn2O
MnO
Mn2O3
MnO2
Mn2O5
MnO3
Mn2O7
Fe2O
FeO
Fe2O3
FeO2
Fe2O5
FeO3
Co2O
CoO
Co2O3
CoO2
Ni2O
NiO
Ni2O3
NiO2
Cu2O
CuO
Cu2O3
ZnO
Zn2O3

2: Ternary Chalcogenides for Solar Cells#

Now let’s explore ternary chalcogenides that might be suitable for solar cells:

def setup_chalcogenide_space():
    """Setup chemical space for ternary chalcogenides"""
    # Group 11 metals (Cu, Ag)
    group_11 = ["Cu", "Ag"]
    # Group 13 metals (In, Ga)
    group_13 = ["Ga", "In"]
    # Chalcogens
    chalcogens = ["S", "Se"]
    
    metal_1 = [smact.Element(m) for m in group_11]
    metal_2 = [smact.Element(m) for m in group_13]
    chalc = [smact.Element(c) for c in chalcogens]
    
    return metal_1, metal_2, chalc

def ternary_chalcogenide_filter(elements):
    """Filter ternary chalcogenides with specific criteria"""
    compounds = []
    m1, m2, ch = elements
    
    # Additional criteria for solar cells
    bandgap_range = (1.0, 2.5)  # eV
    
    for ox_1 in m1.oxidation_states:
        for ox_2 in m2.oxidation_states:
            for ox_ch in ch.oxidation_states:
                ox_states = [ox_1, ox_2, ox_ch]
                
                # Charge neutrality check
                cn_e, cn_r = smact.neutral_ratios(ox_states, threshold=8)
                
                if cn_e:
                    # Electronegativity check
                    eneg_ok = pauling_test(
                        ox_states,
                        [m1.pauling_eneg, m2.pauling_eneg, ch.pauling_eneg]
                    )
                    
                    if eneg_ok:
                        formula = [m1.symbol, m2.symbol, ch.symbol]
                        compounds.append((formula, cn_r[0]))
    
    return compounds

# Generate candidates
m1_els, m2_els, ch_els = setup_chalcogenide_space()
ternary_combinations = [(m1, m2, ch) 
                       for m1 in m1_els 
                       for m2 in m2_els 
                       for ch in ch_els]

with multiprocessing.Pool() as pool:
    ternary_results = pool.map(ternary_chalcogenide_filter, ternary_combinations)

3: Double Perovskites for Ferroelectrics#

Finally, let’s explore double perovskites (A2BB’O6):

def setup_double_perovskite_space():
    """Setup chemical space for double perovskites"""
    # A-site cations (large ionic radius)
    a_site = ["Ba", "Sr", "Ca"]
    # B-site cations (smaller transition metals)
    b_site = ["Fe", "Mn", "Ni"]
    b_prime_site = ["Mo", "W", "Re"]
    
    a_els = [smact.Element(a) for a in a_site]
    b_els = [smact.Element(b) for b in b_site]
    b_prime_els = [smact.Element(bp) for bp in b_prime_site]
    oxygen = smact.Element("O")
    
    return a_els, b_els, b_prime_els, oxygen

def double_perovskite_filter(elements):
    """Filter double perovskites with specific criteria"""
    compounds = []
    a, b, b_prime, o = elements
    
    # Goldschmidt tolerance factor limits
    tol_factor_range = (0.8, 1.0)
    
    for a_ox in a.oxidation_states:
        for b_ox in b.oxidation_states:
            for bp_ox in b_prime.oxidation_states:
                ox_states = [a_ox, b_ox, bp_ox, -2]  # O is -2
                
                # Check charge balance
                if sum([2*a_ox, b_ox, bp_ox, 6*(-2)]) == 0:
                    # Check electronegativity ordering
                    eneg_ok = pauling_test(
                        ox_states,
                        [a.pauling_eneg, b.pauling_eneg, 
                         b_prime.pauling_eneg, 3.44]
                    )
                    
                    if eneg_ok:
                        formula = [a.symbol, b.symbol, b_prime.symbol, "O"]
                        compounds.append((formula, [2, 1, 1, 6]))
    
    return compounds

# Generate candidates
a_els, b_els, bp_els, oxygen = setup_double_perovskite_space()
perovskite_combinations = [(a, b, bp, oxygen) 
                          for a in a_els 
                          for b in b_els 
                          for bp in bp_els]

with multiprocessing.Pool() as pool:
    perovskite_results = pool.map(double_perovskite_filter, 
                                 perovskite_combinations)

# Flatten results and create dataframe
flattened_results = [comp for result in perovskite_results if result for comp in result]
df = pd.DataFrame(flattened_results, columns=['Formula', 'Stoichiometry'])

# Print first 5 candidates
print("\nFirst 5 double perovskite candidates:")
print(df.head())

First 5 double perovskite candidates:
           Formula Stoichiometry
[Ba, Fe, Mo, O]  [2, 1, 1, 6]
[Ba, Fe, Mo, O]  [2, 1, 1, 6]
[Ba, Fe, Mo, O]  [2, 1, 1, 6]
[Ba, Fe, Mo, O]  [2, 1, 1, 6]
[Ba, Fe, Mo, O]  [2, 1, 1, 6]

4: Identifying Potential Battery Materials#

Goal: Find binary compounds suitable for battery applications.

This is a simple example workflow. For a more comprehensive guide, see the Materials Project tutorial or the Materials Project Batteries Explorer documentation.

Workflow Steps:

Fetch Data: Use the Materials Project API to retrieve binary compounds.
Filter Compounds: Select compounds with low energy above hull (i.e., stable) and desired properties.
Apply SMACT Validity Check: Ensure the compounds pass SMACT’s chemical rules.

if myapikey and myapikey != "replace_with_your_mp_api_key" and mp_available:
    from mp_api.client import MPRester
    from smact.screening import smact_validity
    from pymatgen.core import Composition

    with MPRester(myapikey) as mpr:
        # Search for binary compounds (limited for demo)
        docs = mpr.materials.summary.search(
            num_elements=2,  # Fixed: use single integer instead of tuple
            energy_above_hull=(0, 0.05),
            is_metal=False,
            fields=["material_id", "formula_pretty", "band_gap", 
                   "energy_above_hull", "formation_energy_per_atom"]
        )

    # Filter and validate compounds
    valid_compounds = []
    for doc in list(docs)[:100]:  # Limit to first 100 for demo
        formula = doc.formula_pretty
        if smact_validity(formula):
            valid_compounds.append({
                'material_id': doc.material_id,
                'formula': formula,
                'band_gap': doc.band_gap,
                'energy_above_hull': doc.energy_above_hull,
                'formation_energy_per_atom': doc.formation_energy_per_atom,
            })

    print(f"Number of valid battery material candidates: {len(valid_compounds)}")

    # Save to CSV
    df = pd.DataFrame(valid_compounds)
    df.to_csv('battery_material_candidates.csv', index=False)
    print("Results saved to 'battery_material_candidates.csv'")

else:
    print("Skipping battery materials query - Materials Project API not available")
    print("To enable this section:")
    print("1. Set MP_API_KEY environment variable")
    print("2. Install mp-api: pip install mp-api")

Skipping battery materials query - Materials Project API not available
To enable this section:
1. Set MP_API_KEY environment variable
2. Install mp-api: pip install mp-api

Part D: Advanced Oxidation States Filtering#

SMACT now includes advanced filtering capabilities based on ICSD oxidation states data. This allows you to tune filtering workflows using consensus and commonality thresholds to improve the quality of your chemical filtering.

Understanding ICSD24 Oxidation States Filter#

The ICSD24OxStatesFilter allows you to create custom oxidation state files based on:

Consensus: How many different sources agree on an oxidation state
Commonality: How frequently an oxidation state appears in the ICSD database
Include zero: Whether to include zero oxidation states (metals)

This gives you much finer control over chemical filtering than using default oxidation state sets.

# Import the advanced oxidation states filter
import numpy as np
try:
    import plotly.graph_objects as go
    plotly_available = True
except ImportError:
    plotly_available = False
    print("Plotly not available - install with: pip install plotly")

from smact import Element
from smact.screening import smact_filter
from pymatgen.core import Composition
from itertools import combinations
from smact.utils.oxidation import ICSD24OxStatesFilter

# First generate our custom oxidation states file
ox_filter = ICSD24OxStatesFilter()
commonality = 2
custom_filename = f"oxidation_states_icsd24_commonality_{commonality}.txt"

# Write out the filtered oxidation states
ox_filter.write(
    custom_filename,
    consensus=1,
    include_zero=False,
    commonality=commonality,
    comment=f"Oxidation states with commonality ≥ {commonality}"
)

print(f"Wrote filtered oxidation states to '{custom_filename}'")
print(f"\nFilter settings:")
print(f"- Consensus threshold: 1 (at least 1 source agrees)")
print(f"- Commonality threshold: {commonality} (appears at least {commonality} times in ICSD)")
print(f"- Include zero oxidation states: False")

Wrote filtered oxidation states to 'oxidation_states_icsd24_commonality_2.txt'

Filter settings:
- Consensus threshold: 1 (at least 1 source agrees)
- Commonality threshold: 2 (appears at least 2 times in ICSD)
- Include zero oxidation states: False

Comparing Filtering with Different Oxidation State Sets#

Let’s demonstrate the power of this approach by comparing filtering results using different oxidation state sets. We’ll use Bi-Te-In system as an example (relevant for thermoelectric materials):

# Elements of interest for thermoelectric materials
element_symbols = ["Bi", "Te", "In"]
elements = [Element(sym) for sym in element_symbols]

def generate_valid_compositions(el_list, oxidation_states_set="icsd24", maxstoichiometrythreshold=8):
    """Return unique reduced Composition objects from smact_filter."""
    combos = smact_filter(
        el_list,
        threshold=maxstoichiometrythreshold,
        oxidation_states_set=oxidation_states_set,
        species_unique=True
    )
    comps = set()
    for symbols, ox_states, ratios in combos:
        comp = Composition({sym: amt for sym, amt in zip(symbols, ratios)}).reduced_composition
        comps.add(comp)
    return comps

def generate_filtered_compositions(path=None, maxstoichiometrythreshold=8):
    """
    Generate ternary and binary compositions using a custom or default oxidation state set.
    Returns (all_comps, ternary_comps, binary_comps).
    """
    ox_set = path or "icsd24"
    # ternary
    ternary = generate_valid_compositions(elements, ox_set, maxstoichiometrythreshold)
    # binaries
    binary = set()
    for pair in combinations(elements, 2):
        binary |= generate_valid_compositions(list(pair), ox_set, maxstoichiometrythreshold)
    all_comps = ternary | binary
    return all_comps, ternary, binary

# Generate filtered compositions using our custom file
print("Generating filtered compositions (ternary + binaries)...")
try:
    all_comps, ternary_comps, binary_comps = generate_filtered_compositions(custom_filename)
except Exception as e:
    print(f"Warning: {e}\nFalling back to default ICSD24 oxidation states.")
    all_comps, ternary_comps, binary_comps = generate_filtered_compositions()

# Analysis
count_ternary = len(ternary_comps)
count_binary = len(binary_comps)
count_total = len(all_comps)
print(f"\nFiltered ICSD24 results:")
print(f"  Ternary compositions: {count_ternary}")
print(f"  Binary compositions:  {count_binary}")
print(f"  Total compositions:   {count_total}")

Generating filtered compositions (ternary + binaries)...

Filtered ICSD24 results:
  Ternary compositions: 81
  Binary compositions:  12
  Total compositions:   93

Comparing with Different Commonality Thresholds#

Let’s see how changing the commonality threshold affects our results:

# Compare different commonality thresholds
commonality_thresholds = [1, 2, 5, 10]
results_comparison = {}

for threshold in commonality_thresholds:
    # Create custom oxidation states file
    custom_file = f"oxidation_states_icsd24_commonality_{threshold}.txt"
    ox_filter.write(
        custom_file,
        consensus=1,
        include_zero=False,
        commonality=threshold,
        comment=f"Oxidation states with commonality ≥ {threshold}"
    )
    
    # Generate compositions with this threshold
    try:
        all_comps, ternary_comps, binary_comps = generate_filtered_compositions(custom_file)
        results_comparison[threshold] = {
            'total': len(all_comps),
            'ternary': len(ternary_comps),
            'binary': len(binary_comps)
        }
    except Exception as e:
        print(f"Error with threshold {threshold}: {e}")
        results_comparison[threshold] = {'total': 0, 'ternary': 0, 'binary': 0}

# Display comparison
print("\nComparison of different commonality thresholds:")
print(f"{'Threshold':>10} | {'Total':>8} | {'Ternary':>8} | {'Binary':>8}")
print("-" * 50)
for threshold, counts in results_comparison.items():
    print(f"{threshold:>10} | {counts['total']:>8} | {counts['ternary']:>8} | {counts['binary']:>8}")

print("\n Higher commonality thresholds = more restrictive filtering = fewer compositions")

Comparison of different commonality thresholds:
 Threshold |    Total |  Ternary |   Binary
--------------------------------------------------
         1 |       93 |       81 |       12
         2 |       93 |       81 |       12
         5 |       93 |       81 |       12
        10 |       93 |       81 |       12

 Higher commonality thresholds = more restrictive filtering = fewer compositions

Visualising the Results#

Let’s create a ternary plot to visualise our filtered compositions in chemical space:

# Create ternary plot for our compositions
try:
    import plotly.graph_objects as go
    
    # Use compositions from commonality threshold = 2
    all_comps, ternary_comps, binary_comps = generate_filtered_compositions(
        f"oxidation_states_icsd24_commonality_2.txt"
    )
    
    # Extract element fractions for ternary plot
    e1 = np.array([c[element_symbols[0]] for c in all_comps])
    e2 = np.array([c[element_symbols[1]] for c in all_comps])
    e3 = np.array([c[element_symbols[2]] for c in all_comps])
    total = e1 + e2 + e3
    
    # Create ternary scatter plot
    trace = go.Scatterternary(
        a=e1/total,
        b=e2/total,
        c=e3/total,
        mode="markers",
        marker=dict(
            size=8,
            color="green",
            symbol="circle",
            opacity=0.7,
        ),
        name="SMACT Valid",
        cliponaxis=False,
    )
    
    axis_style = dict(
        title=dict(font=dict(size=12)),
        linewidth=1,
        linecolor="black",
        gridcolor="rgba(128, 128, 128, 0.2)",
        showticklabels=True,
        tickvals=[0.2, 0.4, 0.6, 0.8],
    )
    
    fig = go.Figure(trace)
    fig.update_layout(
        font=dict(size=12, family="Arial"),
        width=500,
        height=500,
        ternary=dict(
            bgcolor="rgba(0, 0, 0, 0)",
            aaxis=dict(axis_style, title=element_symbols[0]),
            baxis=dict(axis_style, title=element_symbols[1]),
            caxis=dict(axis_style, title=element_symbols[2]),
        ),
        margin=dict(l=40, r=40, b=40, t=40),
        title=f"SMACT-Valid Compositions in {'-'.join(element_symbols)} System",
        showlegend=False,
    )
    
    # Show the plot
    fig.show()
    
except ImportError:
    print("Plotly not available for ternary plotting. Install with: pip install plotly")
    print(f"Found {len(all_comps)} total compositions in the {'-'.join(element_symbols)} system")
    
    # Show some example compositions instead
    print("\nSample compositions:")
    for i, comp in enumerate(list(all_comps)[:10]):
        print(f"  {i+1:2d}. {comp}")
    if len(all_comps) > 10:
        print(f"  ... and {len(all_comps) - 10} more!")

Understanding the Impact of Different Oxidation State Sets#

Let’s compare the traditional SMACT approach with the new ICSD24 approach:

# Compare different oxidation state sets
ox_state_sets = {
    "SMACT 2014": "smact14",
    "ICSD24 Default": "icsd24", 
    "ICSD24 Filtered (commonality≥5)": "oxidation_states_icsd24_commonality_5.txt"
}

print("Comparison of different oxidation state sets:")
print(f"{'Oxidation Set':>30} | {'Total':>8} | {'Ternary':>8} | {'Binary':>8}")
print("-" * 65)

for name, ox_set in ox_state_sets.items():
    try:
        all_comps, ternary_comps, binary_comps = generate_filtered_compositions(
            ox_set if ox_set.endswith('.txt') else None,
            maxstoichiometrythreshold=8
        )
        if ox_set in ["smact14", "icsd24"]:
            # For built-in sets, need to specify the set name
            all_comps, ternary_comps, binary_comps = generate_filtered_compositions()
            # Re-generate with correct oxidation state set
            ternary = generate_valid_compositions(elements, ox_set, 8)
            binary = set()
            for pair in combinations(elements, 2):
                binary |= generate_valid_compositions(list(pair), ox_set, 8)
            all_comps = ternary | binary
            ternary_comps = ternary
            binary_comps = binary
            
        print(f"{name:>30} | {len(all_comps):>8} | {len(ternary_comps):>8} | {len(binary_comps):>8}")
        
    except Exception as e:
        print(f"{name:>30} | {'Error':>8} | {'Error':>8} | {'Error':>8}")
        print(f"  Error: {e}")

print("\n**Key Insights:**")
print("• ICSD24 sets are based on experimental crystal structure data")
print("• Higher commonality thresholds = more conservative filtering")
print("• Custom filtering allows you to balance coverage vs. reliability")
print("• Different sets may be optimal for different material types")

Comparison of different oxidation state sets:
                 Oxidation Set |    Total |  Ternary |   Binary
-----------------------------------------------------------------

                    SMACT 2014 |      102 |       92 |       10

                ICSD24 Default |      327 |      301 |       26
ICSD24 Filtered (commonality≥5) |       93 |       81 |       12

**Key Insights:**
• ICSD24 sets are based on experimental crystal structure data
• Higher commonality thresholds = more conservative filtering
• Custom filtering allows you to balance coverage vs. reliability
• Different sets may be optimal for different material types

Part E: Advanced Methods#

Parallel Processing for Large Datasets#

When dealing with large chemical spaces, computations can be time-consuming. Using multiprocessing can speed up the process.

import multiprocessing

def process_combinations(els):
    # Your filtering code here
    pass

# with multiprocessing.Pool() as pool:
#     results = pool.map(process_combinations, element_combinations)

Parallel processing is particularly valuable when featurising large datasets, but needs to be handled carefully.

For parallel featurisation using matminer, you can control the number of parallel processes:

from matminer.featurizers import feature_calculators
feature_calculators.set_n_jobs(n_jobs=X)  # X is number of parallel processes

While setting n_jobs=-1 uses all available cores, this can cause memory issues with large datasets. A safer approach is using 1-2 cores ie setting n_jobs to 1 or 2 or chunking the data:

import pandas as pd
from matminer.featurisers import composition as cf

Example chunking approach#

def process_chunk(chunk_df):
    featuriser = cf.ElementProperty.from_preset("magpie")
     return featuriser.featurise_dataframe(chunk_df, "formula")

 # Split dataframe into chunks
chunk_size = 1000  # Adjust based on your memory constraints
chunks = [df[i:i + chunk_size] for i in range(0, len(df), chunk_size)]

# Process chunks sequentially
results = []
for chunk in chunks:
    processed_chunk = process_chunk(chunk)
    results.append(processed_chunk)

# Combine results
final_df = pd.concat(results, ignore_index=True)

Note: Always test your featurisation pipeline on a small subset first before processing the full dataset.

Exercise: Your Turn to Filter!#

Try these advanced filtering challenges:

Explore a different chemical system: Choose 3 elements relevant to your research
Test consensus thresholds: Compare consensus=1 vs consensus=2 vs consensus=3
Include metals: Set include_zero=True and see how it affects results
Combine with property filters: Add criteria like electronegativity differences

Use the cells below for your explorations:”

# Your exploration space - try different element combinations!

# Example: Solar cell materials (Cu-In-Ga-Se system)
# your_elements = ["Cu", "In", "Ga", "Se"]

# Example: Battery materials (Li-Co-O system)  
# your_elements = ["Li", "Co", "O"]

# Define your elements and apply filters
element_symbols = ["Ti", "Zn", "O"]  # Change these to elements of your interest
elements = [Element(sym) for sym in element_symbols]

# Apply the filters we learned
from smact.screening import smact_filter

# Get all possible combinations using correct smact_filter syntax
filtered_compositions = smact_filter(elements, threshold=8)

print(f"Found {len(filtered_compositions)} viable compositions using {element_symbols}")
print("First 10 compositions:")
for i, comp in enumerate(filtered_compositions[:10]):
    symbols, ox_states, ratios = comp
    formula = ""
    for j, (sym, ratio) in enumerate(zip(symbols, ratios)):
        if ratio > 1:
            formula += f"{sym}{ratio}"
        else:
            formula += sym
    print(f"  {i+1:2d}. {formula} (oxidation states: {ox_states})")

if len(filtered_compositions) > 10:
    print(f"  ... and {len(filtered_compositions) - 10} more compositions!")

print(f"\nTry changing element_symbols to explore other chemical systems!")
print("Examples:")
print("- Solar cells: ['Cu', 'In', 'Se']")
print("- Batteries: ['Li', 'Co', 'O']")
print("- Thermoelectrics: ['Bi', 'Te', 'Se']")

Found 64 viable compositions using ['Ti', 'Zn', 'O']
First 10 compositions:
   1. TiZnO2 (oxidation states: (2, 2, -2))
   2. TiZn2O3 (oxidation states: (2, 2, -2))
   3. TiZn3O4 (oxidation states: (2, 2, -2))
   4. TiZn4O5 (oxidation states: (2, 2, -2))
   5. TiZn5O6 (oxidation states: (2, 2, -2))
   6. TiZn6O7 (oxidation states: (2, 2, -2))
   7. TiZn7O8 (oxidation states: (2, 2, -2))
   8. Ti2ZnO3 (oxidation states: (2, 2, -2))
   9. Ti2Zn3O5 (oxidation states: (2, 2, -2))
  10. Ti2Zn5O7 (oxidation states: (2, 2, -2))
  ... and 54 more compositions!

Try changing element_symbols to explore other chemical systems!
Examples:
- Solar cells: ['Cu', 'In', 'Se']
- Batteries: ['Li', 'Co', 'O']
- Thermoelectrics: ['Bi', 'Te', 'Se']

🥳 Conclusion#

In this tutorial, we’ve explored how to use SMACT and related tools to:

Generate chemical spaces either combinatorially or by fetching data from databases
Apply basic chemical filters using charge neutrality and electronegativity rules
Use advanced oxidation state filtering with consensus and commonality thresholds
Identify materials suitable for specific engineering applications
Compare different filtering approaches to optimise your screening workflows

💡 Key Takeaways#

Chemical filtering dramatically reduces search spaces from millions to hundreds of candidates
ICSD24 oxidation states provide experimentally-grounded filtering based on real crystal structures
Consensus and commonality thresholds allow fine-tuning of filter strictness
Different applications benefit from different filtering strategies - no one-size-fits-all approach
Combining multiple filters (chemical + property-based) gives the most targeted results

By leveraging SMACT’s capabilities, you can efficiently navigate the vast landscape of possible compounds and focus on the most promising candidates for experimental validation.