preprocessors¶
- class pyraug.data.preprocessors.DataProcessor(data_normalization_type='min_max_scaling')[source]¶
This is a basic class which preprocesses the data. Basically, it takes messy data, detects potential nan, bad types end convert the data to a type handled by the VAE models (i.e. torch.Tensor). Moreover, if the data does not have the same shape, a reshaping is applied and data is resized to the minimal shape.
- static has_nan(data)[source]¶
Detects potential nan in input data
- Parameters
data (torch.Tensor) – The data to be checked
- Returns
True if data contains
nan
- Return type
(bool)
- normalize_data(data)[source]¶
This function normalizes the input data so that all the values are between 0 and 1
- Parameters
data (torch.Tensor) – The data to normalize
- Retruns:
(torch.Tensor): The normalized data
- process_data(data)[source]¶
This function detects potential check the data type, detects nan in input data and preprocessed the data so it can be handled by the models.
- Parameters
data (Union[np.ndarray, torch.Tensor, List[np.ndarray]]) –
The data that need to be checked. Expected:
list of np.ndarray or
torch.Tensor of shape: n_channels x [optional depth] x [optional height] x width - np.ndarray of shape num_data x n_channels x [optional depth] x [optional height] x width - torch.Tensor of shape num_data x n_channels x [optional depth] x [optional height] x width
- Returns
The data that has bee cleaned
- Return type
clean_data (torch.tensor)
Warning
If you set
normalized_data
to False because you applied your own preprocessing for example, you must ensure that your data is comprised between 0 and 1 or an exception will be raised.Note
If you provide input data that has different shapes (e.g. images of shape (3, 20, 30) and (3, 15, 60)) the data is reshaped to the minimum shape i.e. shape of (3, 15, 30).
- static reshape_data(data_list, target_shape)[source]¶
This function takes an input data and reshape it to a target shape
- Parameters
data (torch.Tensor) – The data to reshape. Expected shape: mini_batch x n_channels x [optional depth] x [optional height] x width
- static to_dataset(data, labels=None)[source]¶
This method converts a set of
torch.Tensor
to aBaseDataset
- Parameters
data (torch.Tensor) – The set of data as a big torch.Tensor
labels (torch.Tensor) – The targets labels as a big torch.Tensor
- Returns
The resulting dataset
- Return type