prokbert.sequtils.segment_sequences_random
- prokbert.sequtils.segment_sequences_random(sequences, params)
Randomly segment the input sequences.
This function takes a list of sequences or a DataFrame containing sequences. If a DataFrame is provided, it’s assumed to be preprocessed, where the “sequence” column stores the sequences to be segmented, and “sequence_id” serves as a valid primary key.
The actual coverage may differ from the expected one. The function returns a list of dictionaries, each containing information about a segment, including its sequence, start position, end position, associated sequence ID, and a segment ID. Note that segment IDs are not generated in this function.
- Parameters
- Returns
A list of dictionaries. Each dictionary contains information about a segment, including its sequence, start position, end position, associated sequence ID, and a segment ID. Note that segment IDs are not generated in this function.
- Return type
list of dict
- Notes
The actual number of segments may differ from the expected number due to the random sampling nature and the presence of sequences shorter than the segment size.