Working with Ensembles
The Ensemble class is a core component of STARLING that represents multiple conformations of a protein chain. This guide covers how to load, analyze, and manipulate conformational ensembles.
See also
- Guided Sampling with Constraints for generating ensembles that satisfy
experimental restraints.
- Ensemble Generation for details on controlling the diffusion
sampler.
Loading Ensembles
Ensembles can be loaded from STARLING format files:
from starling.structure.ensemble import load_ensemble
# Load an ensemble from a file
ensemble = load_ensemble("example_ensemble.starling")
# Optionally ignore 3D structures for faster loading
ensemble = load_ensemble("example_ensemble.starling", ignore_structures=True)
# Get basic ensemble information
print(ensemble) # Shows sequence length, ensemble size, and structure status
print(len(ensemble)) # Number of conformations
Structural Analysis
Calculating Ensemble Properties
The Ensemble class provides methods to compute various biophysical properties:
# Get radius of gyration for all conformations
rg_values = ensemble.radius_of_gyration()
# Get mean radius of gyration
mean_rg = ensemble.radius_of_gyration(return_mean=True)
# Calculate end-to-end distance
end_to_end = ensemble.end_to_end_distance(return_mean=True)
# Calculate hydrodynamic radius using different methods
rh_nygaard = ensemble.hydrodynamic_radius(mode="nygaard", return_mean=True)
rh_kr = ensemble.hydrodynamic_radius(mode="kr", return_mean=True)
Distance and Contact Analysis
Access distance maps and contact information:
# Get distance between specific residues (zero-indexed)
distances = ensemble.rij(0, 10) # Distance between first and 11th residue
mean_distance = ensemble.rij(0, 10, return_mean=True)
# Get distance maps for all conformations
distance_maps = ensemble.distance_maps()
# Get mean distance map
mean_distance_map = ensemble.distance_maps(return_mean=True)
# Calculate contact maps (residues within 11Å)
contact_maps = ensemble.contact_map()
# Get mean contact frequency
mean_contacts = ensemble.contact_map(return_mean=True)
Working with 3D Structures
Accessing and Generating Coordinates
STARLING can generate 3D structures from distance maps:
# Check if the ensemble already has 3D structures
has_structures = ensemble.has_structures
# Access the trajectory (generates 3D structures if needed)
trajectory = ensemble.trajectory
# Explicitly build structures with custom parameters
ensemble.build_ensemble_trajectory(
num_cpus_mds=4, # Number of CPUs for structure generation
num_mds_init=4, # Number of MDS initializations
device="cuda", # Use GPU acceleration if available
force_recompute=True # Rebuild structures even if they exist
)
# Save trajectory to files
ensemble.save_trajectory("my_structures", pdb_trajectory=True) # Save as multi-model PDB
ensemble.save_trajectory("my_structures") # Save as PDB/XTC
Ensemble Reweighting with BME
Optimize ensemble weights to match experimental data:
from starling.structure.bme import ExperimentalObservable
import numpy as np
# Define experimental observables
obs1 = ExperimentalObservable(value=25.0, uncertainty=2.0,
constraint="lower", name="Rg")
obs2 = ExperimentalObservable(value=30.0, uncertainty=3.0,
constraint="upper", name="End-to-end distance")
# Calculate ensemble values for these observables
rg_values = ensemble.radius_of_gyration()
ete_values = ensemble.end_to_end_distance()
calculated = np.column_stack([rg_values, ete_values])
# Perform BME reweighting
result = ensemble.reweight_bme(
observables=[obs1, obs2],
calculated_values=calculated,
theta=0.5 # Balance between data fitting and ensemble diversity
)
# Use BME-reweighted values in calculations
weighted_rg = ensemble.radius_of_gyration(use_bme_weights=True, return_mean=True)
Saving Ensembles
Save ensembles in STARLING format:
# Basic save
ensemble.save("my_ensemble")
# Save with compression and reduced precision for smaller file size
ensemble.save("my_ensemble_compressed", compress=True, reduce_precision=True)