Command Line Interface

STARLING ships with a collection of console scripts that cover ensemble generation, performance benchmarking, format conversion, and similarity search. This page summarises the most common commands and how they fit into an end-to-end workflow.

starling

Generate conformational ensembles directly from the shell. The CLI mirrors the starling.generate() signature and accepts sequences, FASTA/TSV files, or lists of sequences.

starling my_sequences.fasta \
    -c 200 \
    --ionic_strength 150 \
    --return_structures \
    --output_directory outputs \
        -r

Key options:

  • -c / --conformations - number of conformers to sample (default 400)

  • -r/--return_structures – Flag which, if provided, means STARLING returns 3D coordinates alongside distance maps as a .pdb file (single topology file) and a .xtc file (compressed trajectory format).

  • --ionic_strength - choose 20, 150, or 300 mM solvent environments

  • --steps - diffusion steps for the sampler (default 25)

  • --device - force CPU, CUDA (cuda:0), or Apple MPS

  • --num-cpus / --num-mds-init - control MDS reconstruction throughput

  • --outname - override the output prefix when providing a single sequence

Outputs live under the requested directory and includes .starling archives plus optional PDB/XTC trajectories when --return_structures is set.

starling-benchmark

Profile model throughput under different diffusion steps, conformer counts, and hardware options.

starling-benchmark --device cuda:0 --batch-size 64 --steps 30 --single-run 500

The command records runtime and radius-of-gyration measurements to CSV files for later analysis.

Conversion utilities

All converters operate on .starling archives created by the generator.

Command

Purpose

starling2pdb

Convert a STARLING archive into a multi-model PDB trajectory. Pass --remove-errors to drop physically impossible frames first (see below).

starling2xtc

Export a topology PDB paired with an XTC trajectory (reconstructs coordinates if necessary). Pass --remove-errors to drop physically impossible frames first (see below).

starling2numpy

Dump raw distance maps to a Numpy .npy array for custom analyses.

starling2sequence

Print the amino-acid sequence associated with an archive.

starling2info

Display metadata such as version, conformer count, and default weights.

starling2starling / numpy2starling / xtc2starling

Regenerate archives from alternative representations.

By default outputs are written next to the source file; pass -o to choose a new directory or filename prefix.

Removing erroneous frames

starling2pdb and starling2xtc accept a --remove-errors flag. When set, the reconstructed 3D trajectory is scanned for frames containing physically impossible inter-residue distances (a pair of residues separated by |i - j| positions in the sequence cannot be further apart than |i - j| bond lengths), and any such frames are removed before the trajectory is written to disk. This is helpful when a particular sequence is badly behaved and the SMACOF reconstruction occasionally produces unphysical geometry.

starling2xtc my_ensemble.starling -o cleaned.xtc --remove-errors

The equivalent distance-map check (operating on the raw STARLING distance maps rather than the reconstructed coordinates) is available via starling2starling --error-check --remove-errors and the starling.structure.ensemble.Ensemble.check_for_errors() method.

Search tooling

STARLING bundles a FAISS-based similarity search stack to explore large sequence collections.

  • starling-pretokenize - preprocess a FASTA into shard-wise token files that power fast index construction.

  • starling-search build - create a FAISS index from a pretokenized corpus.

  • starling-search query - embed query sequences with the STARLING encoder and retrieve the nearest neighbours with optional reranking.

See Similarity Search for a complete walkthrough of building and querying indexes as well as the Python API.

Tips

  • Use --info or --version with starling to inspect default model paths without launching a generation run.

  • All CLI commands respect the CUDA_VISIBLE_DEVICES environment variable—set it ahead of time for multi-GPU systems.

  • Each converter lazily reconstructs structures the first time they are needed and caches the trajectory in the .starling archive for subsequent use.