starling.frontend.ensemble_generation.handle_input
- handle_input(user_input, invalid_sequence_action='convert', output_name=None, seq_index_start=1)[source]
Dynamically handle the input from the user. This returns a dictionary with either the names from the user’s input file or the users input dictionary of sequences or will create a dictionary with the sequences numbered in the order they were passed in with seq_index_start as the starting index.
- Parameters:
user_input (str, list, or dict) –
This can be one of a few different options:
str: A .fasta file
str: A seq.in file formatted as a .tsv with nametseq
str: A .tsv file formatted as nametseq. Same as seq.in except a different file extension. Borna used a seq.in in his tutorial, so I’m rolling with it.
str: A sequence as a string
list: A list of sequences
dict: A dict of sequences (name: seq)
invalid_sequence_action (str) –
This can be one of 3 options:
fail - invalid sequence cause parsing to fail and throw an exception
remove - invalid sequences are removed
convert - invalid sequences are converted
Default is ‘convert’. Only these 3 options are allowed because STARLING cannot handle non-canonical residues, so we don’t want to use the protfasta.read_fasta() options that allow this to happen.
output_name (str) – If provided and if a single amino acid sequence is passed in, this will be the key in the output dictionary. If None, the key will be ‘sequence_<index>’. If a dictionary or list or path to a FASTA file is passed, this is ignored. Default is None.
seq_index_start (int) – If we need to number sequences in the output dictionary, this is the starting index. This is only needed if a sequence as a string is passed in or if a list of sequences is passed in.
- Returns:
A dictionary of sequences (name: seq)
- Return type: