SeqUtils
Library for sequence processing
|
Loads contigs from a list of FASTA files. |
Creates end-to-end, disjoint segments of a sequence without overlaps. |
|
Randomly segments the input sequences. |
|
|
Segments sequences based on the provided parameters. |
Tokenizes a single segment using Local Context Aware (LCA) tokenization. |
|
Tokenizes or vectorizes a list of k-merized segments into a list of token vectors. |
|
|
Tokenizes a batch of segments and associates them with their provided IDs. |
Parallel tokenization of segments with associated IDs. |
|
|
Create a rectangular numpy array that can be used as input to a Language Model (LM) from tokenized segment data. |
Format the sequence for pretty printing with overlapping k-mers. |
|
|
Generates all possible k-mers from a given alphabet. |
|
Save a numpy array and an optional pandas DataFrame to an HDF5 file. |