starling.frontend.ensemble_generation.handle_input

handle_input(user_input, invalid_sequence_action='convert', output_name=None, seq_index_start=1)[source]

Dynamically handle the input from the user. This returns a dictionary with either the names from the user’s input file or the users input dictionary of sequences or will create a dictionary with the sequences numbered in the order they were passed in with seq_index_start as the starting index.

Parameters:
  • user_input (str, list, or dict) –

    This can be one of a few different options:

    • str: A .fasta file

    • str: A seq.in file formatted as a .tsv with nametseq

    • str: A .tsv file formatted as nametseq. Same as seq.in except a different file extension. Borna used a seq.in in his tutorial, so I’m rolling with it.

    • str: A sequence as a string

    • list: A list of sequences

    • dict: A dict of sequences (name: seq)

  • invalid_sequence_action (str) –

    This can be one of 3 options:

    • fail - invalid sequence cause parsing to fail and throw an exception

    • remove - invalid sequences are removed

    • convert - invalid sequences are converted

    Default is ‘convert’. Only these 3 options are allowed because STARLING cannot handle non-canonical residues, so we don’t want to use the protfasta.read_fasta() options that allow this to happen.

  • output_name (str) – If provided and if a single amino acid sequence is passed in, this will be the key in the output dictionary. If None, the key will be ‘sequence_<index>’. If a dictionary or list or path to a FASTA file is passed, this is ignored. Default is None.

  • seq_index_start (int) – If we need to number sequences in the output dictionary, this is the starting index. This is only needed if a sequence as a string is passed in or if a list of sequences is passed in.

Returns:

A dictionary of sequences (name: seq)

Return type:

dict