prokbert.sequtils.segment_sequence_contiguous

prokbert.sequtils.segment_sequence_contiguous(sequence, params, sequence_id=nan)

Create end-to-end, disjoint segments of a sequence without overlaps.

Segments smaller than the predefined minimum length will be discarded. This function returns a list of segments along with their positions in the original sequence.

Parameters
  • sequence (str) – The input nucleotide sequence to be segmented.

  • params (dict) – Dictionary containing the segmentation parameters. Must have ‘min_length’ and ‘max_length’ keys specifying the minimum and maximum lengths of the segments, respectively.

  • sequence_id (numeric, optional) – An identifier for the sequence. Defaults to NaN.

Returns

Each dictionary in the list represents a segment and contains the segment’s sequence, start position, end position, and sequence ID.

Return type

list of dict