Cluster initialization¶
Gas-phase and deposition workflows build starting structures through
scgo.initialization. The GA population uses the same engine via
ClusterStartGenerator (smart mode by default).
Initialization modes¶
Pass mode to create_initial_cluster() or
init_mode on SurfaceSystemConfig for surface
deposition:
Mode |
Behaviour |
|---|---|
|
Metropolis allocation across templates, seed+growth, and random_spherical. Batch generation discovers strategies once, then assigns per-structure seeds for reproducible parallel runs. |
|
Grow from low-energy candidates in prior |
|
Iterative random placement with clash and connectivity checks; retries relax placement radii within user bounds. |
|
Icosahedral / decahedral / octahedral templates when available for the target size. |
Atom ordering (multi-element GA)¶
Genetic-algorithm cut-and-splice crossover requires parents to share identical
per-index atomic numbers (a1.numbers == a2.numbers), not merely the same
composition.
SCGO therefore:
Keeps the campaign composition list as the canonical symbol order (e.g.
["Ir", "O", "O", "O"], not alphabeticalO-first).Reorders validated structures with
reorder_cluster_to_composition()whenvalidate_cluster(..., sort_atoms=True, composition=...)runs.Applies the same reordering when inserting gas-phase candidates into the GA database.
All structures in a batch for one composition therefore share the same
.numbers vector, which avoids stoichiometry pairing errors in multi-element
runs.
Placement order and diversity¶
For random_spherical and seed growth, atoms are added one at a time. The
order is sampled on each attempt (mass-biased by default, exploratory otherwise);
see scgo.initialization.initialization_config.
Mass-biased (default ~65% of attempts): heavier element groups are placed first (ASE atomic masses); order within each element group is shuffled. This favours metal-first growth for oxides and bimetallics without fixing the same sequence for every structure.
Exploratory (~35%): legacy growth-order strategies (random shuffle, size-based, composition-aware, etc.) preserve batch diversity.
The bias probability is MASS_FIRST_PLACEMENT_PROB in
scgo.initialization.initialization_config (not exposed in GO presets).
Reproducibility¶
All placement randomness flows through a single numpy.random.Generator:
Single structure: pass
rngtocreate_initial_cluster()or setseedonrun_go/run_go_campaign(converted to a generator at the API boundary).Batch / GA population:
create_initial_cluster_batchderives an independent per-structure seed from the parent generator (batch_base_seed + i * 7919), son_jobs=1and parallel workers produce identical populations for the same parent seed.Campaigns:
run_go_campaigndraws a reproducible per-composition seed from the campaign generator; failed compositions are logged and skipped (see below) without aborting the rest of the scan.
Use the same seed everywhere it appears (seed=, go_params['seed'],
ts_params['seed']) when more than one is set.
Connectivity and steric checks¶
connectivity_factor(default1.4in GO presets) scales covalent radii for connectivity validation during initialization and after GA operators.Placement clash tables use
BLMIN_RATIO_DEFAULT(0.7), aligned with GAblmintables viabuild_blmin().
Difficult stoichiometries (e.g. O-rich oxides) may fail initialization; filter
composition scans or relax connectivity_factor / placement parameters rather
than expecting every binary grid point to succeed.
Module reference¶
Cluster initialization package.
Builds starting structures for global optimization and surface deposition.
Main entry points:
create_initial_clusterandcreate_initial_cluster_batchrandom_sphericalandgrow_from_seedcombine_and_growgenerate_template_structure
All randomness flows through numpy.random.Generator arguments.
See the initialization chapter in the project documentation for modes,
ordering, and reproducibility.
- scgo.initialization.create_initial_cluster(composition, rng, placement_radius_scaling=1.2, min_distance_factor=0.4, vacuum=10.0, previous_search_glob='**/*.db', mode='smart', connectivity_factor=1.4)[source]¶
Create an initial cluster using several strategies.
This function provides the single entry point for building starting structures for global optimization. It is implemented as a wrapper around
create_initial_cluster_batch()withn_structures=1to ensure consistent behavior. For “smart” mode, uses probabilistic strategy selection for single calls (deterministic allocation for batch calls).Independent of the creation mode, successful returns obey the same basic invariants:
no hard clashes according to
min_distance_factorand covalent radiithe cluster is connected under
connectivity_factorpositions are reproducible for a given
rngseed
- Parameters:
placement_radius_scaling (
float) – scale factor for radii in random placement.min_distance_factor (
float) – scale factor for minimum distance checks; the placement loop relaxes it slightly if repeated attempts fail.vacuum (
float) – extra padding for the generated simulation cell.previous_search_glob (
str) – glob pattern to find database files.mode (
str) – Initialization strategy:smart(default Metropolis mix of templates, seed+growth, and random_spherical),seed+growth,random_spherical, ortemplate.connectivity_factor (
float) – Factor to multiply sum of covalent radii for connectivity threshold. Defaults toCONNECTIVITY_FACTOR(1.4).rng (
Generator) – numpyGeneratorproviding all randomness for this call.
- Return type:
Atoms- Returns:
An
ase.Atomsinstance with the initial cluster. Whencompositionis empty, returns an emptyAtomsobject.- Raises:
TypeError – If
compositionisNoneor not a list/tuple of strings.ValueError – If numeric parameters are invalid or a valid cluster satisfying the distance/connectivity constraints cannot be constructed.
Note
This function is implemented as a wrapper around
create_initial_cluster_batch()to ensure consistent behavior. For generating multiple structures, usecreate_initial_cluster_batch()directly for better performance and deterministic strategy allocation.
- scgo.initialization.create_initial_cluster_batch(composition, n_structures, rng, placement_radius_scaling=1.2, min_distance_factor=0.4, vacuum=10.0, previous_search_glob='**/*.db', mode='smart', connectivity_factor=1.4, n_jobs=1)[source]¶
Create multiple initial clusters with deterministic per-structure RNG.
For
smartmode, uses Metropolis allocation across templates, seed+growth, and random_spherical. Each structure receives an independent seed derived fromrng(batch_base_seed + index * 7919), so batch results are reproducible and identical forn_jobs=1vs parallel workers when the parentrngstate matches.Validated structures are reordered to match
compositionfor GA pairing.- Return type:
list[Atoms]
- scgo.initialization.random_spherical(composition, cell_side, rng, placement_radius_scaling=1.2, min_distance_factor=0.4, connectivity_factor=1.4, max_connectivity_retries=10, blmin_ratio=0.7)[source]¶
Place atoms randomly within a compact sphere, ensuring minimum distances.
Atoms are added iteratively with covalent-radii-based clash checks and connectivity enforcement. For each retry attempt the algorithm slightly relaxes the effective placement radius and distance thresholds within the user-specified bounds to improve the chance of finding a valid connected configuration. Placement order is sampled on each attempt (mass-biased by default, exploratory otherwise); see
scgo.initialization.initialization_config.When
blmin_ratiois set (default:BLMIN_RATIO_DEFAULT), placement and final validation enforce the same steric floor used by GA operators (ratio_of_covalent_radii/build_blmin). Progressive placement relaxation never drops below that floor. Passblmin_ratio=Noneto disable the GA floor and rely only onmin_distance_factor.- Parameters:
composition (
list[str]) – List of element symbols for the atoms.cell_side (
float) – The side length of the cubic cell for the returned Atoms object.placement_radius_scaling (
float) – A scaling factor used to determine the initial spherical volume for atom placement. Larger values result in a larger initial volume.min_distance_factor (
float) – Factor to scale the sum of covalent radii for minimum allowed distance between atoms. A value of 1.0 means no overlap, while < 1.0 allows some overlap.connectivity_factor (
float) – Factor to multiply sum of covalent radii for connectivity threshold.max_connectivity_retries (
int) – Maximum number of retries if connectivity validation fails.blmin_ratio (
float|None) – GA-compatible steric floor (covalent-radius scale).Nonedisables the extra floor beyondmin_distance_factor.rng (
Generator) – NumpyGeneratorsupplying all randomness for this call (placement order, coordinates, retries).
- Return type:
Atoms- Returns:
An
ase.Atomsinstance with the randomly placed cluster.- Raises:
ValueError – If all atoms cannot be placed within the given constraints
after a maximum number of attempts, or if connectivity validation –
fails after all retries. –
- scgo.initialization.grow_from_seed(seed_atoms, target_composition, placement_radius_scaling, cell_side, rng, min_distance_factor=0.4, connectivity_factor=1.4, blmin_ratio=0.7)[source]¶
Try to grow a smaller candidate
ase.Atomsto the target composition.Growth is performed by repeatedly adding atoms to the existing seed using convex-hull-based placement (via
_add_atoms_to_cluster_iteratively()), with covalent-radii-based clash checks and connectivity enforcement.- Parameters:
seed_atoms (
Atoms) – The seedase.Atomsobject to grow from.target_composition (
list[str]) – The target composition as a list of element symbols.placement_radius_scaling (
float) – A scaling factor to determine the placement shell radius.min_distance_factor (
float) – Factor to scale covalent radii for minimum distance checks.cell_side (
float) – The side length of the cubic cell for the newase.Atomsobject.connectivity_factor (
float) – Factor to multiply sum of covalent radii for connectivity threshold.rng (
Generator) – Optional numpy random number generator.
- Return type:
Atoms|None- Returns:
A new
ase.Atomsobject of the target composition on success, orNoneon failure.
- scgo.initialization.combine_seeds(seeds, cell_side, rng, separation_scaling=1.0, connectivity_factor=1.4, min_distance_factor=0.4)[source]¶
Combines multiple seed clusters into a single new structure using facet-to-facet placement.
- Return type:
Atoms|None
- scgo.initialization.combine_and_grow(seeds, target_composition, cell_side, rng, vdw_scaling=1.0, min_distance_factor=0.4, connectivity_factor=1.4)[source]¶
Combines seeds and grows to target composition.
- Return type:
Atoms|None
- scgo.initialization.compute_cell_side(composition, vacuum=10.0)[source]¶
Estimate a cubic cell side from atomic van-der-Waals volumes.
The estimate computes atomic volumes using ASE’s van-der-Waals radii, converts that to an effective spherical radius and returns a cubic side that contains the cluster plus the requested
vacuumpadding.For elements where ASE’s vdw_radii is NaN (e.g., Co, Fe, Ru), uses interpolated values from neighboring elements (cached per element).
- scgo.initialization.is_cluster_connected(atoms, connectivity_factor=1.4, use_mic=False)[source]¶
Check if all atoms in a cluster are connected within the specified distance threshold.
Uses a Union-Find algorithm with KDTree spatial indexing to efficiently determine if all atoms form a single connected component where edges exist between atoms within (r_i + r_j) * connectivity_factor.
This optimized version uses scipy.spatial.KDTree for efficient neighbor queries, providing O(n log n) performance instead of O(n²) for large clusters.
- Parameters:
- Return type:
- Returns:
True if all atoms are in one connected component, False otherwise.
- scgo.initialization.validate_cluster(atoms, composition=None, min_distance_factor=None, connectivity_factor=1.4, check_clashes=True, check_connectivity=None, sort_atoms=True, raise_on_failure=False, source='', use_mic=False)[source]¶
Unified cluster validation with comprehensive checks.
This function consolidates all validation logic used across the initialization module. It can check composition, clashes, connectivity, and optionally sort atoms by element.
- Parameters:
atoms (
Atoms) – The Atoms object to validatecomposition (
list[str] |None) – Optional expected composition to verify exact matchmin_distance_factor (
float|None) – Factor for minimum distance checks. If None, uses MIN_DISTANCE_FACTOR_DEFAULT when check_clashes is Trueconnectivity_factor (
float) – Factor for connectivity thresholdcheck_clashes (
bool) – Whether to check for atomic clashes (default: True)check_connectivity (
bool|None) – Whether to check connectivity. If None, auto-detects based on atom count (>2 atoms)sort_atoms (
bool) – When True andcompositionis set, reorder atoms to match the composition list (required for GA pairing). When True withoutcomposition, fall back to alphabetical element sort.raise_on_failure (
bool) – Whether to raise ValueError on validation failuresource (
str) – Context string for error messages (e.g., “template”, “seed+growth”)
- Return type:
- Returns:
Tuple of (validated_atoms, is_valid, error_message). If is_valid is True, error_message is empty. validated_atoms may be reordered if sort_atoms=True.
- Raises:
ValueError – If raise_on_failure=True and validation fails
- scgo.initialization.validate_cluster_structure(atoms, min_distance_factor, connectivity_factor, check_clashes=True, check_connectivity=True, use_mic=False)[source]¶
Validate a cluster structure for clashes and connectivity.
This function provides a centralized validation that ensures all returned cluster structures meet the specified constraints. It checks for atomic clashes and connectivity using the same logic as the placement algorithms.
- Parameters:
atoms (
Atoms) – The Atoms object to validatemin_distance_factor (
float) – Factor to scale covalent radii for minimum distance checksconnectivity_factor (
float) – Factor to multiply sum of covalent radii for connectivity thresholdcheck_clashes (
bool) – Whether to check for atomic clashes (default: True)check_connectivity (
bool) – Whether to check connectivity (default: True)
- Return type:
- Returns:
Tuple of (is_valid, error_message). If is_valid is True, error_message is empty. If is_valid is False, error_message contains diagnostic information.
- class scgo.initialization.StructureDiagnostics(is_valid, has_clashes, is_disconnected, clash_details, n_components, closest_inter_component_distance, suggested_connectivity_factor, summary)[source]¶
Bases:
objectContainer for comprehensive structure diagnostics.
- is_valid¶
True if structure has no clashes and is connected
- has_clashes¶
True if atomic clashes were detected
- is_disconnected¶
True if cluster has multiple disconnected components
- clash_details¶
List of clash description strings
- n_components¶
Number of disconnected components (1 if connected)
- closest_inter_component_distance¶
Distance between closest atoms in different components
- suggested_connectivity_factor¶
Connectivity factor needed to connect all components
- summary¶
Human-readable summary of all issues
- scgo.initialization.get_covalent_radius(symbol)[source]¶
Return the covalent radius for
symbolin Angstroms.- Return type:
- scgo.initialization.get_vdw_radius(symbol)[source]¶
Return the van-der-Waals radius for
symbolin Angstroms.- Return type:
- scgo.initialization.get_structure_diagnostics(atoms, min_distance_factor, connectivity_factor, use_mic=False)[source]¶
Get comprehensive diagnostics for a cluster structure.
This function analyzes both clashes and connectivity issues and returns detailed diagnostic information useful for debugging initialization failures.
- Parameters:
- Return type:
- Returns:
StructureDiagnostics object containing detailed analysis results
- scgo.initialization.generate_icosahedron(composition, n_atoms, rng=None, connectivity_factor=1.4)[source]¶
Generate an icosahedral cluster.
Uses ASE’s Icosahedron generator and adjusts atom count by adding/removing surface atoms if needed.
- Parameters:
- Return type:
Atoms|None- Returns:
Atoms object with icosahedral structure, or None if generation fails
- scgo.initialization.generate_decahedron(composition, n_atoms, rng=None, connectivity_factor=1.4)[source]¶
Generate a decahedral cluster.
Uses ASE’s Decahedron generator and adjusts atom count by adding/removing surface atoms if needed.
- Parameters:
- Return type:
Atoms|None- Returns:
Atoms object with decahedral structure, or None if generation fails
- scgo.initialization.generate_octahedron(composition, n_atoms, rng=None, connectivity_factor=1.4)[source]¶
Generate an octahedral cluster.
Uses ASE’s Octahedron generator and adjusts atom count by adding/removing surface atoms if needed.
- Parameters:
- Return type:
Atoms|None- Returns:
Atoms object with octahedral structure, or None if generation fails
- scgo.initialization.generate_tetrahedron(composition, n_atoms, rng=None, connectivity_factor=1.4)[source]¶
Generate a tetrahedral cluster with the specified number of atoms.
Creates a regular tetrahedron with atoms at vertices. Only supports 4 atoms (the vertices of a regular tetrahedron).
- Parameters:
- Return type:
Atoms|None- Returns:
Atoms object with tetrahedral structure, or None if generation fails (e.g., n_atoms != 4)
- scgo.initialization.generate_cube(composition, n_atoms, rng=None, connectivity_factor=1.4)[source]¶
Generate a cubic cluster with the specified number of atoms.
Creates cubic structures (n×n×n cubes) for perfect cube sizes only. Only supports perfect cubes (8, 27, 64, 125, etc.).
- Parameters:
- Return type:
Atoms|None- Returns:
Atoms object with cubic structure, or None if generation fails (e.g., n_atoms is not a perfect cube)
- scgo.initialization.generate_cuboctahedron(composition, n_atoms, rng=None, connectivity_factor=1.4)[source]¶
Generate a cuboctahedral cluster with the specified number of atoms.
Cuboctahedron has 12 vertices. For 13 atoms, adds a center atom.
- Parameters:
- Return type:
Atoms|None- Returns:
Atoms object with cuboctahedral structure, or None if generation fails
- scgo.initialization.generate_truncated_octahedron(composition, n_atoms, rng=None, connectivity_factor=1.4)[source]¶
Generate a truncated octahedral cluster with the specified number of atoms.
Truncated octahedron has 24 vertices (6 square faces, 8 hexagonal faces). Only supports 24 atoms (the vertices of a truncated octahedron).
- Parameters:
- Return type:
Atoms|None- Returns:
Atoms object with truncated octahedral structure, or None if generation fails (e.g., n_atoms != 24 or position generation doesn’t yield exactly 24 positions)
- scgo.initialization.generate_template_structure(composition, n_atoms, template_type='auto', rng=None, connectivity_factor=1.4)[source]¶
Generate a template structure of the specified type.
- Parameters:
n_atoms (
int) – Target number of atomstemplate_type (
str) – Type of template. Can be: - “auto”: Automatically select best template type - “icosahedron”: Icosahedral structure - “decahedron”: Decahedral structure - “octahedron”: Octahedral structure - “tetrahedron”: Tetrahedral structure - “cube”: Cubic structure - “cuboctahedron”: Cuboctahedral structure - “truncated_octahedron”: Truncated octahedral structurerng (
Generator|None) – Optional random number generator
- Return type:
Atoms|None- Returns:
Atoms object with template structure, or None if generation fails
- scgo.initialization.get_nearest_magic_number(n_atoms)[source]¶
Find the nearest magic number to the given atom count.