prokbert.sequtils.save_to_hdf
- prokbert.sequtils.save_to_hdf(X: ndarray, hdf_file_path: str, database: Optional[DataFrame] = None, compression: bool = False, pd_chunksize: int = 10000000) None
Save a numpy array and an optional pandas DataFrame to an HDF5 file.
- Parameters
X (np.ndarray) – 2D numpy array to be saved.
hdf_file_path (str) – Path to the HDF5 file.
database (pd.DataFrame) – Pandas DataFrame to be saved. Defaults to None.
compression (bool) – Whether to apply compression. Defaults to False.
pd_chunksize (int) – Number of rows per chunk for saving the DataFrame. Defaults to 10,000,000.
- Raises
ValueError – If the provided numpy array is not 2D.
OSError – If there’s an error creating the directory structure or removing an existing HDF5 file.
Example:
>>> import numpy as np >>> import pandas as pd >>> array = np.random.random((100, 100)) >>> df = pd.DataFrame({'A': range(1, 101), 'B': range(101, 201)}) >>> save_to_hdf(array, "sample.hdf5", database=df, compression=True)