prokbert.sequtils.segment_sequence_contiguous
- prokbert.sequtils.segment_sequence_contiguous(sequence: str, params: Dict[str, Any], sequence_id: Optional[Any] = nan) List[Dict[str, Any]]
Creates end-to-end, disjoint segments of a sequence without overlaps.
Segments smaller than the predefined minimum length will be discarded. This function returns a list of segments along with their positions in the original sequence.
- Parameters
sequence (str) – The input nucleotide sequence to be segmented.
params (Dict[str, Any]) – Dictionary containing the segmentation parameters. Must include ‘min_length’ and ‘max_length’ keys specifying the minimum and maximum lengths of the segments, respectively. Can contain other parameters.
sequence_id (Optional[Any]) – An identifier for the sequence, optional. Defaults to NaN.
- Returns
A list of dictionaries, each representing a segment. Each dictionary contains the segment’s sequence, start position, end position, and sequence ID.
- Return type
List[Dict[str, Any]]
- Example:
>>> params = {'min_length': 0, 'max_length': 100} >>> segment_sequence_contiguous('ATCGATCGA', params) [{'segment': 'ATCGATCGA', 'segment_start': 0, 'segment_end': 9, 'sequence_id': np.nan}]