starling.inference.generation.sequence_encoder_backend

sequence_encoder_backend(sequence_dict, device, batch_size, ionic_strength, aggregate=True, output_directory=None, model_manager=<starling.inference.model_loading.ModelManager object>, encoder_path=None, ddpm_path=None, pretokenized: bool = False, bucket: bool = False, bucket_size: int = 32, free_cuda_cache: bool = False, return_on_cpu: bool = True)[source]

Generate embeddings for sequences and optionally save them to disk.

Parameters:

sequence_dict (dict) – Dictionary of sequence names to sequences
device (str) – Device to use for computation
batch_size (int) – Batch size for processing
ionic_strength (float) – Ionic strength [mM] to condition the model
output_directory (str, optional) – If provided, embeddings will be saved to this directory with sequence name as filename
model_manager (ModelManager) – Model manager instance
encoder_path (str, optional) – Custom encoder path
ddpm_path (str, optional) – Custom diffusion model path
pretokenized (bool, default False) – If True, values of sequence_dict are assumed to already be iterable collections of integer token ids (lists/tuples/torch tensors). Skips tokenization.
bucket (bool, default False) – If True, sequences are grouped into coarse length buckets (multiple of bucket_size) to reduce padding waste. Beneficial when length distribution is very broad.
bucket_size (int, default 32) – Length resolution for bucketing when bucket=True. Sequences with lengths that fall into the same bucket ( (L//bucket_size) ) are batched together.
free_cuda_cache (bool, default False) – If True and running on CUDA, calls torch.cuda.empty_cache() after each batch.
return_on_cpu (bool, default True) – If True, embeddings are transferred to CPU before being returned or saved. If False, embeddings remain on the original device (e.g., GPU), which can be useful when performing downstream tensor operations on the same device.

Returns:

If output_directory is None, returns dictionary name -> tensor (L_i, D). Otherwise returns None (embeddings written to disk as <name>.pt).

Return type:

dict or None