starling.frontend.ensemble_generation.handle_input

handle_input(user_input, invalid_sequence_action='convert', output_name=None, seq_index_start=1)[source]

Dynamically handle the input from the user. This returns a dictionary with either the names from the user’s input file or the users input dictionary of sequences or will create a dictionary with the sequences numbered in the order they were passed in with seq_index_start as the starting index.

Parameters:

user_input (str, list, or dict) –
This can be one of a few different options:
- str: A .fasta file
- str: A seq.in file formatted as a .tsv with nametseq
- str: A .tsv file formatted as nametseq. Same as seq.in except a different file extension. Borna used a seq.in in his tutorial, so I’m rolling with it.
- str: A sequence as a string
- list: A list of sequences
- dict: A dict of sequences (name: seq)
invalid_sequence_action (str) –
This can be one of 3 options:
- fail - invalid sequence cause parsing to fail and throw an exception
- remove - invalid sequences are removed
- convert - invalid sequences are converted
Default is ‘convert’. Only these 3 options are allowed because STARLING cannot handle non-canonical residues, so we don’t want to use the protfasta.read_fasta() options that allow this to happen.
output_name (str) – If provided and if a single amino acid sequence is passed in, this will be the key in the output dictionary. If None, the key will be ‘sequence_<index>’. If a dictionary or list or path to a FASTA file is passed, this is ignored. Default is None.
seq_index_start (int) – If we need to number sequences in the output dictionary, this is the starting index. This is only needed if a sequence as a string is passed in or if a list of sequences is passed in.

Returns:

A dictionary of sequences (name: seq)

Return type:

dict