preprocessors

class pyraug.data.preprocessors.DataProcessor(data_normalization_type='min_max_scaling')[source]

This is a basic class which preprocesses the data. Basically, it takes messy data, detects potential nan, bad types end convert the data to a type handled by the VAE models (i.e. torch.Tensor). Moreover, if the data does not have the same shape, a reshaping is applied and data is resized to the minimal shape.

static has_nan(data)[source]

Detects potential nan in input data

Parameters

data (torch.Tensor) – The data to be checked

Returns

True if data contains nan

Return type

(bool)

normalize_data(data)[source]

This function normalizes the input data so that all the values are between 0 and 1

Parameters

data (torch.Tensor) – The data to normalize

Retruns:

(torch.Tensor): The normalized data

process_data(data)[source]

This function detects potential check the data type, detects nan in input data and preprocessed the data so it can be handled by the models.

Parameters

data (Union[np.ndarray, torch.Tensor, List[np.ndarray]]) –

The data that need to be checked. Expected:

  • list of np.ndarray or

torch.Tensor of shape: n_channels x [optional depth] x [optional height] x width - np.ndarray of shape num_data x n_channels x [optional depth] x [optional height] x width - torch.Tensor of shape num_data x n_channels x [optional depth] x [optional height] x width

Returns

The data that has bee cleaned

Return type

clean_data (torch.tensor)

Warning

If you set normalized_data to False because you applied your own preprocessing for example, you must ensure that your data is comprised between 0 and 1 or an exception will be raised.

Note

If you provide input data that has different shapes (e.g. images of shape (3, 20, 30) and (3, 15, 60)) the data is reshaped to the minimum shape i.e. shape of (3, 15, 30).

static reshape_data(data_list, target_shape)[source]

This function takes an input data and reshape it to a target shape

Parameters

data (torch.Tensor) – The data to reshape. Expected shape: mini_batch x n_channels x [optional depth] x [optional height] x width

static to_dataset(data, labels=None)[source]

This method converts a set of torch.Tensor to a BaseDataset

Parameters
  • data (torch.Tensor) – The set of data as a big torch.Tensor

  • labels (torch.Tensor) – The targets labels as a big torch.Tensor

Returns

The resulting dataset

Return type

(BaseDataset)

static to_tensor(data)[source]

Converts numpy arrays to torch.Tensor format

Parameters

data (np.ndarray) – The data to be converted

Returns

The transformed data

Return type

(torch.Tensor)