Installation

SCGO is published on PyPI and can also be installed from source for development.

Prerequisites

  • Python 3.12+

  • SQLite with JSON1 extension (required; use pysqlite3-binary for pip installs if your Python sqlite3 lacks JSON1)

  • CUDA (for GPU acceleration with MLIPs)

Conda (development from source)

For contributors or editable installs with the full MACE + dev toolchain:

git clone https://github.com/rlaplaza-lab/scgo.git
cd scgo
conda env create -f environment.yml
conda activate scgo

The conda environment installs SCGO in editable mode with [mace,dev] (MACE/TorchSim plus test and lint tooling). Note that vesin and vesin-torch conflict with the TorchSim stack used by SCGO and should not be installed.

Editable install from source

git clone https://github.com/rlaplaza-lab/scgo.git
cd scgo
pip install -e ".[mace]"   # or: pip install -e ".[uma]"

Development Installation

For development with tests and linting:

pip install -e ".[mace,dev]"  # or: pip install -e ".[uma,dev]"
pre-commit install

Dependency Notes

  • SCGO requires exactly one of the [mace] or [uma] extras for MLIP support

  • The MACE and UMA extras use incompatible dependency stacks

  • SQLite JSON1 extension is required for database operations (pysqlite3-binary recommended for pip installs)

  • Sella is optional for advanced optimization features and requires a C toolchain

  • SCGO allows scipy>=1.14,<3 to resolve cleanly with fairchem UMA dependencies

Parallel jobs and output directories

SCGO generates unique run folders (run_YYYYMMDD_HHMMSS_ffffff), so parallel jobs launched from the same parent output directory usually write to different *.db files. Log lines like Using cached results for: ... are normally in-process cache hits, not a lock by themselves.

SQLite can still serialize writes when two jobs touch the same database file (for example, reusing the same explicit run_id or output path), and shared filesystems may add contention for registry lock files. For large parallel campaigns, prefer one output directory per job (or job-local scratch, then copy results back).

For performance-sensitive GA/BH runs on local storage, optional DB/GA tuning knobs:

  • db_enable_expression_indexes=True — JSON expression indexes for metadata filtering/sorting paths.

  • ga_adaptive_retry_enabled=True (default) with ga_retry_floor_multiplier/ga_retry_ceiling_multiplier — bound retry budgets without hard-capping exploration.

  • ga_fast_prefilter_enabled=True (default) — low-cost clash rejection before full structural validation.

HPC and shared filesystems

When running on Slurm clusters or network filesystems (Lustre, GPFS, NFS):

SQLite

SCGO keeps WAL mode off by default (fewer -wal/-shm issues on shared filesystems). Prefer writing active *.db files under job-local scratch ($SLURM_TMPDIR or site-specific scratch) when you can, then copying results back to project storage.

Registry

Discovery may write .scgo_db_registry.json and .scgo_db_registry.lock (with flock on Linux) for fast database listing. When your run lives under a directory whose name ends in _searches, the index is kept at that parent only (not beside every trial_* folder). If your filesystem does not honor flock, use separate output directories per job or avoid parallel registry updates.

Logging

Batch-friendly defaults suppress noisy third-party loggers. For local debugging, set SCGO_LOCAL_DEV=1 or call configure_logging(..., hpc_mode=False).

Publishing releases

Maintainers publish to PyPI via the Publish to PyPI GitHub Actions workflow (workflow_dispatch). Configure trusted publishing on PyPI for the pypi and testpypi GitHub environments, then run the workflow with confirm set to publish.