RHVAE¶

This modules is the implementation of the Riemannian Hamiltonian VAE proposed in (https://arxiv.org/pdf/2010.11518.pdf).

This module contains:

a RHVAE instance which is the implementation of the model.
a RHVAESampler instance alowing to sample from the latent

space of such a model as proposed in (https://arxiv.org/pdf/2105.00026.pdf).

class pyraug.models.RHVAE(model_config, encoder=None, decoder=None, metric=None)[source]¶

This is an implementation of the Riemannian Hamiltonian VAE model proposed in (https://arxiv.org/pdf/2010.11518.pdf). This model provides a way to learn the Riemannian latent structure of a given set of data set through a parametrized Riemannian metric having the following shape: \(\mathbf{G}^{-1}(z) = \sum \limits _{i=1}^N L_{\psi_i} L_{\psi_i}^{\top} \exp \Big(-\frac{\lVert z - c_i \rVert_2^2}{T^2} \Big) + \lambda I_d\)

and to generate new data. It is particularly well suited for High Dimensional data combined with low sample number and proved relevant for Data Augmentation as proved in (https://arxiv.org/pdf/2105.00026.pdf).

Parameters: model_config (RHVAEConfig) – A model configuration setting the main parameters of the model

Note

For high dimensional data we advice you to provide you own network architectures. With the provided MLP you may end up with a MemoryError.

forward(inputs)[source]¶

The input data is first encoded. The reparametrization is used to produce a sample \(z_0\) from the approximate posterior \(q_{\phi}(z|x)\). Then Riemannian Hamiltonian equations are solved using the generalized leapfrog integrator. In the meantime, the input data \(x\) is fed to the metric network outputing the matrices \(L_{\psi}\). The metric is computed and used with the integrator.

Parameters: inputs (Dict[str, torch.Tensor]) – The training data with labels
Returns: An instance of ModelOutput containing all the relevant parameters
Return type: output (ModelOutput)

likelihood(x, sample_size=10)[source]¶: Estimate the likelihood of the model \(\log(p(x))\) using importance sampling on \(q_{\phi}(z|x)\)

classmethod load_from_folder(dir_path)[source]¶

Class method to be used to load the model from a specific folder

Parameters: dir_path (str) – The path where the model should have been be saved.

Note

This function requires the folder to contain:: a model_config.json and a model.pt if no custom architectures were provided a model_config.json, a model.pt and a encoder.pkl (resp. decoder.pkl or/and metric.pkl) if a custom encoder (resp. decoder or/and metric) was provided

save(dir_path)[source]¶

Method to save the model at a specific location

Parameters: dir_path (str) – The path where the model should be saved. If the path path does not exist a folder will be created at the provided location.

set_metric(metric)[source]¶

This method is called to set the metric network outputing the \(L_{\psi_i}\) of the metric matrices

Parameters: metric (BaseMetric) – The metric module that need to be set to the model.

update()[source]¶

Method that allows model update during the training.

If needed, this method must be implemented in a child class.

By default, it does nothing.

update_metric()[source]¶: As soon as the model has seen all the data points (i.e. at the end of 1 loop) we update the final metric function using mu(x_i) as centroids

class pyraug.models.rhvae.RHVAEConfig(input_dim=None, latent_dim=10, uses_default_encoder=True, uses_default_decoder=True, n_lf=3, eps_lf=0.001, beta_zero=0.3, temperature=1.5, regularization=0.01, uses_default_metric=True)[source]¶

Riemannian Hamiltonian Auto Encoder config class

Parameters

latent_dim (int) – The latent dimension used for the latent space. Default: 10
n_lf (int) – The number of leapfrog steps to used in the integrator: Default: 3
eps_lf (int) – The leapfrog stepsize. Default: 1e-3
beta_zero (int) – The tempering factor in the Riemannian Hamiltonian Monte Carlo Sampler. Default: 0.3
temperature (float) – The metric temperature \(T\). Default: 1.5
regularization (float) – The metric regularization factor \(\lambda\)
uses_default_metric (bool) – Whether it uses a custom or default metric architecture. This is updated automatically.

classmethod from_dict(config_dict)¶

Creates a BaseConfig instance from a dictionnary

Parameters: config_dict (dict) – The Python dictionnary containing all the parameters
Returns: The created instance
Return type: BaseConfig

classmethod from_json_file(json_path)¶

Creates a BaseConfig instance from a JSON config file

Parameters: json_path (str) – The path to the json file containing all the parameters
Returns: The created instance
Return type: BaseConfig

save_json(dir_path, filename)¶

Saves a .json file from the dataclass

Parameters

dir_path (str) – path to the folder
filename (str) – the name of the file

to_dict()¶

Transforms object into a Python dictionnary

Returns: The dictionnary containing all the parameters
Return type: (dict)

to_json_string()¶

Transforms object into a JSON string

Returns: The JSON str containing all the parameters
Return type: (str)

class pyraug.models.rhvae.RHVAESamplerConfig(output_dir=None, batch_size=50, samples_per_save=500, no_cuda=False, mcmc_steps_nbr=100, n_lf=15, eps_lf=0.03, beta_zero=1.0)[source]¶

HMCSampler config class containing the main parameters of the sampler.

Parameters

num_samples (int) – The number of samples to generate. Default: 1
batch_size (int) – The number of samples per batch. Batching is used to speed up generation and avoid memory overflows. Default: 50
mcmc_steps (int) – The number of MCMC steps to use in the latent space HMC sampler. Default: 100
n_lf (int) – The number of leapfrog to use in the integrator of the HMC sampler. Default: 15
eps_lf (float) – The leapfrog stepsize in the integrator of the HMC sampler. Default: 3e-2
random_start (bool) – Initialization of the latent space sampler. If False, the sampler starts the Markov chain on the metric centroids. If True , a random start is applied. Default: False

class pyraug.models.rhvae.RHVAESampler(model, sampler_config=None)[source]¶

Hamiltonian Monte Carlo Sampler class. This is an implementation of the Hamiltonian/Hybrid Monte Carlo sampler (https://en.wikipedia.org/wiki/Hamiltonian_Monte_Carlo)

Parameters

model (RHVAE) – The VAE model to sample from
sampler_config (RHVAESamplerConfig) – A HMCSamplerConfig instance containing the main parameters of the sampler. If None, a pre-defined configuration is used. Default: None

sample(samples_number)[source]¶

HMC sampling with a RHVAE.

The data is saved in the output_dir (folder passed in the BaseSamplerConfig instance) in a folder named generation_YYYY-MM-DD_hh-mm-ss. If output_dir is None, a folder named dummy_output_dir is created in this folder.

Parameters: num_samples (int) – The number of samples to generate