Configuration Utils

BaseConfig

prokbert.config_utils.BaseConfig()

Base class for managing and validating configurations.

prokbert.config_utils.BaseConfig.cast_to_expected_type(...)

Cast the given value to the expected type.

prokbert.config_utils.BaseConfig.get_parameter(...)

Retrieve the default value of a specified parameter.

prokbert.config_utils.BaseConfig.validate_type(...)

Validate the type of a given value against the expected type.

prokbert.config_utils.BaseConfig.validate_value(...)

Validate the value of a parameter against its constraints.

prokbert.config_utils.BaseConfig.validate(...)

Validate both the type and value of a parameter.

prokbert.config_utils.BaseConfig.describe(...)

Retrieve the description of a parameter.

class prokbert.config_utils.BaseConfig

Base class for managing and validating configurations.

cast_to_expected_type(parameter_class: str, parameter_name: str, value: any) any

Cast the given value to the expected type.

Parameters
  • parameter_class (str) – The class/category of the parameter.

  • parameter_name (str) – The name of the parameter.

  • value (any) – The value to be casted.

Returns

Value casted to the expected type.

Return type

any

Raises

ValueError – If casting fails.

describe(parameter_class: str, parameter_name: str) str

Retrieve the description of a parameter.

Parameters
  • parameter_class (str) – The class/category of the parameter.

  • parameter_name (str) – The name of the parameter.

Returns

Description of the parameter.

Return type

str

get_parameter(parameter_class: str, parameter_name: str) any

Retrieve the default value of a specified parameter.

Parameters
  • parameter_class (str) – The class/category of the parameter (e.g., ‘segmentation’).

  • parameter_name (str) – The name of the parameter.

Returns

Default value of the parameter, casted to the expected type.

Return type

any

validate(parameter_class: str, parameter_name: str, value: any)

Validate both the type and value of a parameter.

Parameters
  • parameter_class (str) – The class/category of the parameter.

  • parameter_name (str) – The name of the parameter.

  • value (any) – The value to be validated.

Raises
  • TypeError – If the value is not of the expected type.

  • ValueError – If the value does not meet the parameter’s constraints.

validate_type(parameter_class: str, parameter_name: str, value: any) bool

Validate the type of a given value against the expected type.

Parameters
  • parameter_class (str) – The class/category of the parameter.

  • parameter_name (str) – The name of the parameter.

  • value (any) – The value to be validated.

Returns

True if the value is of the expected type, otherwise False.

Return type

bool

validate_value(parameter_class: str, parameter_name: str, value: any) bool

Validate the value of a parameter against its constraints.

Parameters
  • parameter_class (str) – The class/category of the parameter.

  • parameter_name (str) – The name of the parameter.

  • value (any) – The value to be validated.

Returns

True if the value meets the constraints, otherwise False.

Return type

bool

SeqConfig

prokbert.config_utils.SeqConfig()

Class to manage and validate sequence processing configurations.

prokbert.config_utils.SeqConfig._get_default_sequence_processing_config_file()

Retrieve the default sequence processing configuration file.

prokbert.config_utils.SeqConfig.get_and_set_segmentation_parameters([...])

Retrieve and validate the provided parameters for segmentation.

prokbert.config_utils.SeqConfig.get_and_set_tokenization_parameters([...])

prokbert.config_utils.SeqConfig.get_and_set_computational_parameters([...])

Reading and validating the computational paramters

prokbert.config_utils.SeqConfig.get_maximum_segment_length_from_token_count_from_params()

Calculating the maximum length of the segment from the token count

prokbert.config_utils.SeqConfig.get_maximum_segment_length_from_token_count(...)

Calcuates how long sequence can be covered

prokbert.config_utils.SeqConfig.get_maximum_token_count_from_max_length(...)

Calcuates how long sequence can be covered

class prokbert.config_utils.SeqConfig

Bases: BaseConfig

Class to manage and validate sequence processing configurations.

get_and_set_computational_parameters(parameters: dict = {}) dict

Reading and validating the computational paramters

get_and_set_segmentation_parameters(parameters: dict = {}) dict

Retrieve and validate the provided parameters for segmentation.

Parameters

parameters (dict) – A dictionary of parameters to be validated.

Returns

A dictionary of validated segmentation parameters.

Return type

dict

Raises

ValueError – If an invalid segmentation parameter is provided.

static get_maximum_segment_length_from_token_count(max_token_counts, shift, kmer)

Calcuates how long sequence can be covered

get_maximum_segment_length_from_token_count_from_params()

Calculating the maximum length of the segment from the token count

static get_maximum_token_count_from_max_length(max_segment_length, shift, kmer)

Calcuates how long sequence can be covered

get_maximum_token_count_from_max_length_from_params()

Calculating the maximum length of the segment from the token count

ProkBERTConfig

prokbert.config_utils.ProkBERTConfig()

Class to manage and validate pretraining configurations.

prokbert.config_utils.ProkBERTConfig._get_default_pretrain_config_file()

Retrieve the default pretraining configuration file.

prokbert.config_utils.ProkBERTConfig.get_set_parameters(...)

Retrieve and validate the provided parameters for a given parameter class.

prokbert.config_utils.ProkBERTConfig.get_and_set_model_parameters([...])

Setting the model parameters

prokbert.config_utils.ProkBERTConfig.get_and_set_dataset_parameters([...])

Setting the dataset parameters

prokbert.config_utils.ProkBERTConfig.get_and_set_pretraining_parameters([...])

Setting the model parameters

prokbert.config_utils.ProkBERTConfig.get_and_set_datacollator_parameters([...])

Setting the model parameters

prokbert.config_utils.ProkBERTConfig.get_and_set_segmentation_parameters([...])

prokbert.config_utils.ProkBERTConfig.get_and_set_tokenization_parameters([...])

prokbert.config_utils.ProkBERTConfig.get_and_set_computation_params([...])

class prokbert.config_utils.ProkBERTConfig

Bases: BaseConfig

Class to manage and validate pretraining configurations.

get_and_set_datacollator_parameters(parameters: dict = {}) dict

Setting the model parameters

get_and_set_dataset_parameters(parameters: dict = {}) dict

Setting the dataset parameters

get_and_set_model_parameters(parameters: dict = {}) dict

Setting the model parameters

get_and_set_pretraining_parameters(parameters: dict = {}) dict

Setting the model parameters

get_set_parameters(parameter_class: str, parameters: dict = {}) dict

Retrieve and validate the provided parameters for a given parameter class.

Parameters
  • parameter_class (str) – The class/category of the parameter (e.g., ‘data_collator’).

  • parameters (dict) – A dictionary of parameters to be validated.

Returns

A dictionary of validated parameters.

Return type

dict

Raises

ValueError – If an invalid parameter is provided.