preprocessors¶

class pyraug.data.preprocessors.DataProcessor(data_normalization_type='min_max_scaling')[source]¶

This is a basic class which preprocesses the data. Basically, it takes messy data, detects potential nan, bad types end convert the data to a type handled by the VAE models (i.e. torch.Tensor). Moreover, if the data does not have the same shape, a reshaping is applied and data is resized to the minimal shape.

static has_nan(data)[source]¶

Detects potential nan in input data

Parameters: data (torch.Tensor) – The data to be checked
Returns: True if data contains nan
Return type: (bool)

normalize_data(data)[source]¶

This function normalizes the input data so that all the values are between 0 and 1

Parameters: data (torch.Tensor) – The data to normalize

Retruns:: (torch.Tensor): The normalized data

process_data(data)[source]¶

This function detects potential check the data type, detects nan in input data and preprocessed the data so it can be handled by the models.

Parameters

data (Union[np.ndarray, torch.Tensor, List[np.ndarray]]) –

The data that need to be checked. Expected:

list of np.ndarray or

torch.Tensor of shape: n_channels x [optional depth] x [optional height] x width - np.ndarray of shape num_data x n_channels x [optional depth] x [optional height] x width - torch.Tensor of shape num_data x n_channels x [optional depth] x [optional height] x width

Returns

The data that has bee cleaned

Return type

clean_data (torch.tensor)

Warning

If you set normalized_data to False because you applied your own preprocessing for example, you must ensure that your data is comprised between 0 and 1 or an exception will be raised.

Note

If you provide input data that has different shapes (e.g. images of shape (3, 20, 30) and (3, 15, 60)) the data is reshaped to the minimum shape i.e. shape of (3, 15, 30).

static reshape_data(data_list, target_shape)[source]¶

This function takes an input data and reshape it to a target shape

Parameters: data (torch.Tensor) – The data to reshape. Expected shape: mini_batch x n_channels x [optional depth] x [optional height] x width

static to_dataset(data, labels=None)[source]¶

This method converts a set of torch.Tensor to a BaseDataset

Parameters

data (torch.Tensor) – The set of data as a big torch.Tensor
labels (torch.Tensor) – The targets labels as a big torch.Tensor

Returns

The resulting dataset

Return type

(BaseDataset)

static to_tensor(data)[source]¶

Converts numpy arrays to torch.Tensor format

Parameters: data (np.ndarray) – The data to be converted
Returns: The transformed data
Return type: (torch.Tensor)