prokbert.sequtils.save_to_hdf

prokbert.sequtils.save_to_hdf(X: ndarray, hdf_file_path: str, database: Optional[DataFrame] = None, compression: bool = False, pd_chunksize: int = 10000000) None

Save a numpy array and an optional pandas DataFrame to an HDF5 file.

Parameters
  • X (np.ndarray) – 2D numpy array to be saved.

  • hdf_file_path (str) – Path to the HDF5 file.

  • database (pd.DataFrame) – Pandas DataFrame to be saved. Defaults to None.

  • compression (bool) – Whether to apply compression. Defaults to False.

  • pd_chunksize (int) – Number of rows per chunk for saving the DataFrame. Defaults to 10,000,000.

Raises
  • ValueError – If the provided numpy array is not 2D.

  • OSError – If there’s an error creating the directory structure or removing an existing HDF5 file.

Example:

>>> import numpy as np
    >>> import pandas as pd
    >>> array = np.random.random((100, 100))
    >>> df = pd.DataFrame({'A': range(1, 101), 'B': range(101, 201)})
    >>> save_to_hdf(array, "sample.hdf5", database=df, compression=True)