RHVAE

This modules is the implementation of the Riemannian Hamiltonian VAE proposed in (https://arxiv.org/pdf/2010.11518.pdf).

This module contains:
class pyraug.models.RHVAE(model_config, encoder=None, decoder=None, metric=None)[source]

This is an implementation of the Riemannian Hamiltonian VAE model proposed in (https://arxiv.org/pdf/2010.11518.pdf). This model provides a way to learn the Riemannian latent structure of a given set of data set through a parametrized Riemannian metric having the following shape: \(\mathbf{G}^{-1}(z) = \sum \limits _{i=1}^N L_{\psi_i} L_{\psi_i}^{\top} \exp \Big(-\frac{\lVert z - c_i \rVert_2^2}{T^2} \Big) + \lambda I_d\)

and to generate new data. It is particularly well suited for High Dimensional data combined with low sample number and proved relevant for Data Augmentation as proved in (https://arxiv.org/pdf/2105.00026.pdf).

Parameters

model_config (RHVAEConfig) – A model configuration setting the main parameters of the model

Note

For high dimensional data we advice you to provide you own network architectures. With the provided MLP you may end up with a MemoryError.

forward(inputs)[source]

The input data is first encoded. The reparametrization is used to produce a sample \(z_0\) from the approximate posterior \(q_{\phi}(z|x)\). Then Riemannian Hamiltonian equations are solved using the generalized leapfrog integrator. In the meantime, the input data \(x\) is fed to the metric network outputing the matrices \(L_{\psi}\). The metric is computed and used with the integrator.

Parameters

inputs (Dict[str, torch.Tensor]) – The training data with labels

Returns

An instance of ModelOutput containing all the relevant parameters

Return type

output (ModelOutput)

likelihood(x, sample_size=10)[source]

Estimate the likelihood of the model \(\log(p(x))\) using importance sampling on \(q_{\phi}(z|x)\)

classmethod load_from_folder(dir_path)[source]

Class method to be used to load the model from a specific folder

Parameters

dir_path (str) – The path where the model should have been be saved.

Note

This function requires the folder to contain:

a model_config.json and a model.pt if no custom architectures were provided a model_config.json, a model.pt and a encoder.pkl (resp. decoder.pkl or/and metric.pkl) if a custom encoder (resp. decoder or/and metric) was provided

save(dir_path)[source]

Method to save the model at a specific location

Parameters

dir_path (str) – The path where the model should be saved. If the path path does not exist a folder will be created at the provided location.

set_metric(metric)[source]

This method is called to set the metric network outputing the \(L_{\psi_i}\) of the metric matrices

Parameters

metric (BaseMetric) – The metric module that need to be set to the model.

update()[source]

Method that allows model update during the training.

If needed, this method must be implemented in a child class.

By default, it does nothing.

update_metric()[source]

As soon as the model has seen all the data points (i.e. at the end of 1 loop) we update the final metric function using mu(x_i) as centroids

class pyraug.models.rhvae.RHVAEConfig(input_dim=None, latent_dim=10, uses_default_encoder=True, uses_default_decoder=True, n_lf=3, eps_lf=0.001, beta_zero=0.3, temperature=1.5, regularization=0.01, uses_default_metric=True)[source]

Riemannian Hamiltonian Auto Encoder config class

Parameters
  • latent_dim (int) – The latent dimension used for the latent space. Default: 10

  • n_lf (int) – The number of leapfrog steps to used in the integrator: Default: 3

  • eps_lf (int) – The leapfrog stepsize. Default: 1e-3

  • beta_zero (int) – The tempering factor in the Riemannian Hamiltonian Monte Carlo Sampler. Default: 0.3

  • temperature (float) – The metric temperature \(T\). Default: 1.5

  • regularization (float) – The metric regularization factor \(\lambda\)

  • uses_default_metric (bool) – Whether it uses a custom or default metric architecture. This is updated automatically.

classmethod from_dict(config_dict)

Creates a BaseConfig instance from a dictionnary

Parameters

config_dict (dict) – The Python dictionnary containing all the parameters

Returns

The created instance

Return type

BaseConfig

classmethod from_json_file(json_path)

Creates a BaseConfig instance from a JSON config file

Parameters

json_path (str) – The path to the json file containing all the parameters

Returns

The created instance

Return type

BaseConfig

save_json(dir_path, filename)

Saves a .json file from the dataclass

Parameters
  • dir_path (str) – path to the folder

  • filename (str) – the name of the file

to_dict()

Transforms object into a Python dictionnary

Returns

The dictionnary containing all the parameters

Return type

(dict)

to_json_string()

Transforms object into a JSON string

Returns

The JSON str containing all the parameters

Return type

(str)

class pyraug.models.rhvae.RHVAESamplerConfig(output_dir=None, batch_size=50, samples_per_save=500, no_cuda=False, mcmc_steps_nbr=100, n_lf=15, eps_lf=0.03, beta_zero=1.0)[source]

HMCSampler config class containing the main parameters of the sampler.

Parameters
  • num_samples (int) – The number of samples to generate. Default: 1

  • batch_size (int) – The number of samples per batch. Batching is used to speed up generation and avoid memory overflows. Default: 50

  • mcmc_steps (int) – The number of MCMC steps to use in the latent space HMC sampler. Default: 100

  • n_lf (int) – The number of leapfrog to use in the integrator of the HMC sampler. Default: 15

  • eps_lf (float) – The leapfrog stepsize in the integrator of the HMC sampler. Default: 3e-2

  • random_start (bool) – Initialization of the latent space sampler. If False, the sampler starts the Markov chain on the metric centroids. If True , a random start is applied. Default: False

class pyraug.models.rhvae.RHVAESampler(model, sampler_config=None)[source]

Hamiltonian Monte Carlo Sampler class. This is an implementation of the Hamiltonian/Hybrid Monte Carlo sampler (https://en.wikipedia.org/wiki/Hamiltonian_Monte_Carlo)

Parameters
  • model (RHVAE) – The VAE model to sample from

  • sampler_config (RHVAESamplerConfig) – A HMCSamplerConfig instance containing the main parameters of the sampler. If None, a pre-defined configuration is used. Default: None

sample(samples_number)[source]

HMC sampling with a RHVAE.

The data is saved in the output_dir (folder passed in the BaseSamplerConfig instance) in a folder named generation_YYYY-MM-DD_hh-mm-ss. If output_dir is None, a folder named dummy_output_dir is created in this folder.

Parameters

num_samples (int) – The number of samples to generate