Cluster initialization

Gas-phase and deposition workflows build starting structures through scgo.initialization. The GA population uses the same engine via ClusterStartGenerator (smart mode by default).

Initialization modes

Pass mode to create_initial_cluster() or init_mode on SurfaceSystemConfig for surface deposition:

Mode

Behaviour

smart (default)

Metropolis allocation across templates, seed+growth, and random_spherical. Batch generation discovers strategies once, then assigns per-structure seeds for reproducible parallel runs.

seed+growth

Grow from low-energy candidates in prior *.db searches (Boltzmann sampling by composition counts). Falls back to random_spherical when no suitable seed exists.

random_spherical

Iterative random placement with clash and connectivity checks; retries relax placement radii within user bounds.

template

Icosahedral / decahedral / octahedral templates when available for the target size.

Atom ordering (multi-element GA)

Genetic-algorithm cut-and-splice crossover requires parents to share identical per-index atomic numbers (a1.numbers == a2.numbers), not merely the same composition.

SCGO therefore:

  • Keeps the campaign composition list as the canonical symbol order (e.g. ["Ir", "O", "O", "O"], not alphabetical O-first).

  • Reorders validated structures with reorder_cluster_to_composition() when validate_cluster(..., sort_atoms=True, composition=...) runs.

  • Applies the same reordering when inserting gas-phase candidates into the GA database.

All structures in a batch for one composition therefore share the same .numbers vector, which avoids stoichiometry pairing errors in multi-element runs.

Placement order and diversity

For random_spherical and seed growth, atoms are added one at a time. The order is sampled on each attempt (mass-biased by default, exploratory otherwise); see scgo.initialization.initialization_config.

  • Mass-biased (default ~65% of attempts): heavier element groups are placed first (ASE atomic masses); order within each element group is shuffled. This favours metal-first growth for oxides and bimetallics without fixing the same sequence for every structure.

  • Exploratory (~35%): legacy growth-order strategies (random shuffle, size-based, composition-aware, etc.) preserve batch diversity.

The bias probability is MASS_FIRST_PLACEMENT_PROB in scgo.initialization.initialization_config (not exposed in GO presets).

Reproducibility

All placement randomness flows through a single numpy.random.Generator:

  • Single structure: pass rng to create_initial_cluster() or set seed on run_go / run_go_campaign (converted to a generator at the API boundary).

  • Batch / GA population: create_initial_cluster_batch derives an independent per-structure seed from the parent generator (batch_base_seed + i * 7919), so n_jobs=1 and parallel workers produce identical populations for the same parent seed.

  • Campaigns: run_go_campaign draws a reproducible per-composition seed from the campaign generator; failed compositions are logged and skipped (see below) without aborting the rest of the scan.

Use the same seed everywhere it appears (seed=, go_params['seed'], ts_params['seed']) when more than one is set.

Connectivity and steric checks

  • connectivity_factor (default 1.4 in GO presets) scales covalent radii for connectivity validation during initialization and after GA operators.

  • Placement clash tables use BLMIN_RATIO_DEFAULT (0.7), aligned with GA blmin tables via build_blmin().

Difficult stoichiometries (e.g. O-rich oxides) may fail initialization; filter composition scans or relax connectivity_factor / placement parameters rather than expecting every binary grid point to succeed.

Module reference

Cluster initialization package.

Builds starting structures for global optimization and surface deposition.

Main entry points:

  • create_initial_cluster and create_initial_cluster_batch

  • random_spherical and grow_from_seed

  • combine_and_grow

  • generate_template_structure

All randomness flows through numpy.random.Generator arguments. See the initialization chapter in the project documentation for modes, ordering, and reproducibility.

scgo.initialization.create_initial_cluster(composition, rng, placement_radius_scaling=1.2, min_distance_factor=0.4, vacuum=10.0, previous_search_glob='**/*.db', mode='smart', connectivity_factor=1.4)[source]

Create an initial cluster using several strategies.

This function provides the single entry point for building starting structures for global optimization. It is implemented as a wrapper around create_initial_cluster_batch() with n_structures=1 to ensure consistent behavior. For “smart” mode, uses probabilistic strategy selection for single calls (deterministic allocation for batch calls).

Independent of the creation mode, successful returns obey the same basic invariants:

  • no hard clashes according to min_distance_factor and covalent radii

  • the cluster is connected under connectivity_factor

  • positions are reproducible for a given rng seed

Parameters:
  • composition (list[str]) – target list of element symbols.

  • placement_radius_scaling (float) – scale factor for radii in random placement.

  • min_distance_factor (float) – scale factor for minimum distance checks; the placement loop relaxes it slightly if repeated attempts fail.

  • vacuum (float) – extra padding for the generated simulation cell.

  • previous_search_glob (str) – glob pattern to find database files.

  • mode (str) – Initialization strategy: smart (default Metropolis mix of templates, seed+growth, and random_spherical), seed+growth, random_spherical, or template.

  • connectivity_factor (float) – Factor to multiply sum of covalent radii for connectivity threshold. Defaults to CONNECTIVITY_FACTOR (1.4).

  • rng (Generator) – numpy Generator providing all randomness for this call.

Return type:

Atoms

Returns:

An ase.Atoms instance with the initial cluster. When composition is empty, returns an empty Atoms object.

Raises:
  • TypeError – If composition is None or not a list/tuple of strings.

  • ValueError – If numeric parameters are invalid or a valid cluster satisfying the distance/connectivity constraints cannot be constructed.

Note

This function is implemented as a wrapper around create_initial_cluster_batch() to ensure consistent behavior. For generating multiple structures, use create_initial_cluster_batch() directly for better performance and deterministic strategy allocation.

scgo.initialization.create_initial_cluster_batch(composition, n_structures, rng, placement_radius_scaling=1.2, min_distance_factor=0.4, vacuum=10.0, previous_search_glob='**/*.db', mode='smart', connectivity_factor=1.4, n_jobs=1)[source]

Create multiple initial clusters with deterministic per-structure RNG.

For smart mode, uses Metropolis allocation across templates, seed+growth, and random_spherical. Each structure receives an independent seed derived from rng (batch_base_seed + index * 7919), so batch results are reproducible and identical for n_jobs=1 vs parallel workers when the parent rng state matches.

Validated structures are reordered to match composition for GA pairing.

Return type:

list[Atoms]

scgo.initialization.random_spherical(composition, cell_side, rng, placement_radius_scaling=1.2, min_distance_factor=0.4, connectivity_factor=1.4, max_connectivity_retries=10, blmin_ratio=0.7)[source]

Place atoms randomly within a compact sphere, ensuring minimum distances.

Atoms are added iteratively with covalent-radii-based clash checks and connectivity enforcement. For each retry attempt the algorithm slightly relaxes the effective placement radius and distance thresholds within the user-specified bounds to improve the chance of finding a valid connected configuration. Placement order is sampled on each attempt (mass-biased by default, exploratory otherwise); see scgo.initialization.initialization_config.

When blmin_ratio is set (default: BLMIN_RATIO_DEFAULT), placement and final validation enforce the same steric floor used by GA operators (ratio_of_covalent_radii / build_blmin). Progressive placement relaxation never drops below that floor. Pass blmin_ratio=None to disable the GA floor and rely only on min_distance_factor.

Parameters:
  • composition (list[str]) – List of element symbols for the atoms.

  • cell_side (float) – The side length of the cubic cell for the returned Atoms object.

  • placement_radius_scaling (float) – A scaling factor used to determine the initial spherical volume for atom placement. Larger values result in a larger initial volume.

  • min_distance_factor (float) – Factor to scale the sum of covalent radii for minimum allowed distance between atoms. A value of 1.0 means no overlap, while < 1.0 allows some overlap.

  • connectivity_factor (float) – Factor to multiply sum of covalent radii for connectivity threshold.

  • max_connectivity_retries (int) – Maximum number of retries if connectivity validation fails.

  • blmin_ratio (float | None) – GA-compatible steric floor (covalent-radius scale). None disables the extra floor beyond min_distance_factor.

  • rng (Generator) – Numpy Generator supplying all randomness for this call (placement order, coordinates, retries).

Return type:

Atoms

Returns:

An ase.Atoms instance with the randomly placed cluster.

Raises:
  • ValueError – If all atoms cannot be placed within the given constraints

  • after a maximum number of attempts, or if connectivity validation

  • fails after all retries.

scgo.initialization.grow_from_seed(seed_atoms, target_composition, placement_radius_scaling, cell_side, rng, min_distance_factor=0.4, connectivity_factor=1.4, blmin_ratio=0.7)[source]

Try to grow a smaller candidate ase.Atoms to the target composition.

Growth is performed by repeatedly adding atoms to the existing seed using convex-hull-based placement (via _add_atoms_to_cluster_iteratively()), with covalent-radii-based clash checks and connectivity enforcement.

Parameters:
  • seed_atoms (Atoms) – The seed ase.Atoms object to grow from.

  • target_composition (list[str]) – The target composition as a list of element symbols.

  • placement_radius_scaling (float) – A scaling factor to determine the placement shell radius.

  • min_distance_factor (float) – Factor to scale covalent radii for minimum distance checks.

  • cell_side (float) – The side length of the cubic cell for the new ase.Atoms object.

  • connectivity_factor (float) – Factor to multiply sum of covalent radii for connectivity threshold.

  • rng (Generator) – Optional numpy random number generator.

Return type:

Atoms | None

Returns:

A new ase.Atoms object of the target composition on success, or None on failure.

scgo.initialization.combine_seeds(seeds, cell_side, rng, separation_scaling=1.0, connectivity_factor=1.4, min_distance_factor=0.4)[source]

Combines multiple seed clusters into a single new structure using facet-to-facet placement.

Return type:

Atoms | None

scgo.initialization.combine_and_grow(seeds, target_composition, cell_side, rng, vdw_scaling=1.0, min_distance_factor=0.4, connectivity_factor=1.4)[source]

Combines seeds and grows to target composition.

Return type:

Atoms | None

scgo.initialization.compute_cell_side(composition, vacuum=10.0)[source]

Estimate a cubic cell side from atomic van-der-Waals volumes.

The estimate computes atomic volumes using ASE’s van-der-Waals radii, converts that to an effective spherical radius and returns a cubic side that contains the cluster plus the requested vacuum padding.

For elements where ASE’s vdw_radii is NaN (e.g., Co, Fe, Ru), uses interpolated values from neighboring elements (cached per element).

Parameters:
  • composition (list[str]) – Sequence of element symbols (e.g. [“Pt”, “Pt”])

  • vacuum (float) – Extra padding (Å) to add to the estimated diameter.

Return type:

float

Returns:

Cubic cell side length in Å. Returns 0.0 for an empty composition.

scgo.initialization.is_cluster_connected(atoms, connectivity_factor=1.4, use_mic=False)[source]

Check if all atoms in a cluster are connected within the specified distance threshold.

Uses a Union-Find algorithm with KDTree spatial indexing to efficiently determine if all atoms form a single connected component where edges exist between atoms within (r_i + r_j) * connectivity_factor.

This optimized version uses scipy.spatial.KDTree for efficient neighbor queries, providing O(n log n) performance instead of O(n²) for large clusters.

Parameters:
  • atoms (Atoms) – The Atoms object to check

  • connectivity_factor (float) – Factor to multiply sum of covalent radii for connectivity threshold. Defaults to CONNECTIVITY_FACTOR (1.4).

  • use_mic (bool) – If True, use minimum image convention for distance calculations.

Return type:

bool

Returns:

True if all atoms are in one connected component, False otherwise.

scgo.initialization.validate_cluster(atoms, composition=None, min_distance_factor=None, connectivity_factor=1.4, check_clashes=True, check_connectivity=None, sort_atoms=True, raise_on_failure=False, source='', use_mic=False)[source]

Unified cluster validation with comprehensive checks.

This function consolidates all validation logic used across the initialization module. It can check composition, clashes, connectivity, and optionally sort atoms by element.

Parameters:
  • atoms (Atoms) – The Atoms object to validate

  • composition (list[str] | None) – Optional expected composition to verify exact match

  • min_distance_factor (float | None) – Factor for minimum distance checks. If None, uses MIN_DISTANCE_FACTOR_DEFAULT when check_clashes is True

  • connectivity_factor (float) – Factor for connectivity threshold

  • check_clashes (bool) – Whether to check for atomic clashes (default: True)

  • check_connectivity (bool | None) – Whether to check connectivity. If None, auto-detects based on atom count (>2 atoms)

  • sort_atoms (bool) – When True and composition is set, reorder atoms to match the composition list (required for GA pairing). When True without composition, fall back to alphabetical element sort.

  • raise_on_failure (bool) – Whether to raise ValueError on validation failure

  • source (str) – Context string for error messages (e.g., “template”, “seed+growth”)

Return type:

tuple[Atoms, bool, str]

Returns:

Tuple of (validated_atoms, is_valid, error_message). If is_valid is True, error_message is empty. validated_atoms may be reordered if sort_atoms=True.

Raises:

ValueError – If raise_on_failure=True and validation fails

scgo.initialization.validate_cluster_structure(atoms, min_distance_factor, connectivity_factor, check_clashes=True, check_connectivity=True, use_mic=False)[source]

Validate a cluster structure for clashes and connectivity.

This function provides a centralized validation that ensures all returned cluster structures meet the specified constraints. It checks for atomic clashes and connectivity using the same logic as the placement algorithms.

Parameters:
  • atoms (Atoms) – The Atoms object to validate

  • min_distance_factor (float) – Factor to scale covalent radii for minimum distance checks

  • connectivity_factor (float) – Factor to multiply sum of covalent radii for connectivity threshold

  • check_clashes (bool) – Whether to check for atomic clashes (default: True)

  • check_connectivity (bool) – Whether to check connectivity (default: True)

Return type:

tuple[bool, str]

Returns:

Tuple of (is_valid, error_message). If is_valid is True, error_message is empty. If is_valid is False, error_message contains diagnostic information.

class scgo.initialization.StructureDiagnostics(is_valid, has_clashes, is_disconnected, clash_details, n_components, closest_inter_component_distance, suggested_connectivity_factor, summary)[source]

Bases: object

Container for comprehensive structure diagnostics.

is_valid

True if structure has no clashes and is connected

has_clashes

True if atomic clashes were detected

is_disconnected

True if cluster has multiple disconnected components

clash_details

List of clash description strings

n_components

Number of disconnected components (1 if connected)

closest_inter_component_distance

Distance between closest atoms in different components

suggested_connectivity_factor

Connectivity factor needed to connect all components

summary

Human-readable summary of all issues

scgo.initialization.get_covalent_radius(symbol)[source]

Return the covalent radius for symbol in Angstroms.

Return type:

float

scgo.initialization.get_vdw_radius(symbol)[source]

Return the van-der-Waals radius for symbol in Angstroms.

Return type:

float

scgo.initialization.get_structure_diagnostics(atoms, min_distance_factor, connectivity_factor, use_mic=False)[source]

Get comprehensive diagnostics for a cluster structure.

This function analyzes both clashes and connectivity issues and returns detailed diagnostic information useful for debugging initialization failures.

Parameters:
  • atoms (Atoms) – The Atoms object to analyze

  • min_distance_factor (float) – Factor to scale covalent radii for minimum distance checks

  • connectivity_factor (float) – Factor to multiply sum of covalent radii for connectivity threshold

Return type:

StructureDiagnostics

Returns:

StructureDiagnostics object containing detailed analysis results

scgo.initialization.generate_icosahedron(composition, n_atoms, rng=None, connectivity_factor=1.4)[source]

Generate an icosahedral cluster.

Uses ASE’s Icosahedron generator and adjusts atom count by adding/removing surface atoms if needed.

Parameters:
  • composition (list[str]) – List of element symbols (cycled to match n_atoms)

  • n_atoms (int) – Target number of atoms

  • rng (Generator | None) – Optional random number generator for reproducibility

  • connectivity_factor (float) – Factor for connectivity threshold

Return type:

Atoms | None

Returns:

Atoms object with icosahedral structure, or None if generation fails

scgo.initialization.generate_decahedron(composition, n_atoms, rng=None, connectivity_factor=1.4)[source]

Generate a decahedral cluster.

Uses ASE’s Decahedron generator and adjusts atom count by adding/removing surface atoms if needed.

Parameters:
  • composition (list[str]) – List of element symbols (cycled to match n_atoms)

  • n_atoms (int) – Target number of atoms

  • rng (Generator | None) – Optional random number generator for reproducibility

  • connectivity_factor (float) – Factor for connectivity threshold

Return type:

Atoms | None

Returns:

Atoms object with decahedral structure, or None if generation fails

scgo.initialization.generate_octahedron(composition, n_atoms, rng=None, connectivity_factor=1.4)[source]

Generate an octahedral cluster.

Uses ASE’s Octahedron generator and adjusts atom count by adding/removing surface atoms if needed.

Parameters:
  • composition (list[str]) – List of element symbols (cycled to match n_atoms)

  • n_atoms (int) – Target number of atoms

  • rng (Generator | None) – Optional random number generator for reproducibility

  • connectivity_factor (float) – Factor for connectivity threshold

Return type:

Atoms | None

Returns:

Atoms object with octahedral structure, or None if generation fails

scgo.initialization.generate_tetrahedron(composition, n_atoms, rng=None, connectivity_factor=1.4)[source]

Generate a tetrahedral cluster with the specified number of atoms.

Creates a regular tetrahedron with atoms at vertices. Only supports 4 atoms (the vertices of a regular tetrahedron).

Parameters:
  • composition (list[str]) – List of element symbols (cycled to match n_atoms)

  • n_atoms (int) – Target number of atoms (must be 4)

  • rng (Generator | None) – Optional random number generator for reproducibility

Return type:

Atoms | None

Returns:

Atoms object with tetrahedral structure, or None if generation fails (e.g., n_atoms != 4)

scgo.initialization.generate_cube(composition, n_atoms, rng=None, connectivity_factor=1.4)[source]

Generate a cubic cluster with the specified number of atoms.

Creates cubic structures (n×n×n cubes) for perfect cube sizes only. Only supports perfect cubes (8, 27, 64, 125, etc.).

Parameters:
  • composition (list[str]) – List of element symbols (cycled to match n_atoms)

  • n_atoms (int) – Target number of atoms (must be a perfect cube: n³)

  • rng (Generator | None) – Optional random number generator for reproducibility

Return type:

Atoms | None

Returns:

Atoms object with cubic structure, or None if generation fails (e.g., n_atoms is not a perfect cube)

scgo.initialization.generate_cuboctahedron(composition, n_atoms, rng=None, connectivity_factor=1.4)[source]

Generate a cuboctahedral cluster with the specified number of atoms.

Cuboctahedron has 12 vertices. For 13 atoms, adds a center atom.

Parameters:
  • composition (list[str]) – List of element symbols (cycled to match n_atoms)

  • n_atoms (int) – Target number of atoms (12 or 13 for perfect structures)

  • rng (Generator | None) – Optional random number generator for reproducibility

Return type:

Atoms | None

Returns:

Atoms object with cuboctahedral structure, or None if generation fails

scgo.initialization.generate_truncated_octahedron(composition, n_atoms, rng=None, connectivity_factor=1.4)[source]

Generate a truncated octahedral cluster with the specified number of atoms.

Truncated octahedron has 24 vertices (6 square faces, 8 hexagonal faces). Only supports 24 atoms (the vertices of a truncated octahedron).

Parameters:
  • composition (list[str]) – List of element symbols (cycled to match n_atoms)

  • n_atoms (int) – Target number of atoms (must be 24)

  • rng (Generator | None) – Optional random number generator for reproducibility

Return type:

Atoms | None

Returns:

Atoms object with truncated octahedral structure, or None if generation fails (e.g., n_atoms != 24 or position generation doesn’t yield exactly 24 positions)

scgo.initialization.generate_template_structure(composition, n_atoms, template_type='auto', rng=None, connectivity_factor=1.4)[source]

Generate a template structure of the specified type.

Parameters:
  • composition (list[str]) – List of element symbols

  • n_atoms (int) – Target number of atoms

  • template_type (str) – Type of template. Can be: - “auto”: Automatically select best template type - “icosahedron”: Icosahedral structure - “decahedron”: Decahedral structure - “octahedron”: Octahedral structure - “tetrahedron”: Tetrahedral structure - “cube”: Cubic structure - “cuboctahedron”: Cuboctahedral structure - “truncated_octahedron”: Truncated octahedral structure

  • rng (Generator | None) – Optional random number generator

Return type:

Atoms | None

Returns:

Atoms object with template structure, or None if generation fails

scgo.initialization.get_nearest_magic_number(n_atoms)[source]

Find the nearest magic number to the given atom count.

Parameters:

n_atoms (int) – Number of atoms in the cluster

Return type:

int | None

Returns:

The nearest magic number, or None if no magic numbers are defined

scgo.initialization.is_near_magic_number(n_atoms, tolerance=2)[source]

Check if the atom count is near a magic number.

Parameters:
  • n_atoms (int) – Number of atoms in the cluster

  • tolerance (int) – Maximum difference from magic number to be considered “near”

Return type:

bool

Returns:

True if n_atoms is within tolerance of any magic number