RHVAE¶
This modules is the implementation of the Riemannian Hamiltonian VAE proposed in (https://arxiv.org/pdf/2010.11518.pdf).
- This module contains:
a
RHVAE
instance which is the implementation of the model.- a
RHVAESampler
instance alowing to sample from the latentspace of such a model as proposed in (https://arxiv.org/pdf/2105.00026.pdf).
- class pyraug.models.RHVAE(model_config, encoder=None, decoder=None, metric=None)[source]¶
This is an implementation of the Riemannian Hamiltonian VAE model proposed in (https://arxiv.org/pdf/2010.11518.pdf). This model provides a way to learn the Riemannian latent structure of a given set of data set through a parametrized Riemannian metric having the following shape: \(\mathbf{G}^{-1}(z) = \sum \limits _{i=1}^N L_{\psi_i} L_{\psi_i}^{\top} \exp \Big(-\frac{\lVert z - c_i \rVert_2^2}{T^2} \Big) + \lambda I_d\)
and to generate new data. It is particularly well suited for High Dimensional data combined with low sample number and proved relevant for Data Augmentation as proved in (https://arxiv.org/pdf/2105.00026.pdf).
- Parameters
model_config (RHVAEConfig) – A model configuration setting the main parameters of the model
Note
For high dimensional data we advice you to provide you own network architectures. With the provided MLP you may end up with a
MemoryError
.- forward(inputs)[source]¶
The input data is first encoded. The reparametrization is used to produce a sample \(z_0\) from the approximate posterior \(q_{\phi}(z|x)\). Then Riemannian Hamiltonian equations are solved using the generalized leapfrog integrator. In the meantime, the input data \(x\) is fed to the metric network outputing the matrices \(L_{\psi}\). The metric is computed and used with the integrator.
- Parameters
inputs (Dict[str, torch.Tensor]) – The training data with labels
- Returns
An instance of ModelOutput containing all the relevant parameters
- Return type
output (ModelOutput)
- likelihood(x, sample_size=10)[source]¶
Estimate the likelihood of the model \(\log(p(x))\) using importance sampling on \(q_{\phi}(z|x)\)
- classmethod load_from_folder(dir_path)[source]¶
Class method to be used to load the model from a specific folder
- Parameters
dir_path (str) – The path where the model should have been be saved.
Note
- This function requires the folder to contain:
a
model_config.json
and amodel.pt
if no custom architectures were provided amodel_config.json
, amodel.pt
and aencoder.pkl
(resp.decoder.pkl
or/andmetric.pkl
) if a custom encoder (resp. decoder or/and metric) was provided
- save(dir_path)[source]¶
Method to save the model at a specific location
- Parameters
dir_path (str) – The path where the model should be saved. If the path path does not exist a folder will be created at the provided location.
- set_metric(metric)[source]¶
This method is called to set the metric network outputing the \(L_{\psi_i}\) of the metric matrices
- Parameters
metric (BaseMetric) – The metric module that need to be set to the model.
- class pyraug.models.rhvae.RHVAEConfig(input_dim=None, latent_dim=10, uses_default_encoder=True, uses_default_decoder=True, n_lf=3, eps_lf=0.001, beta_zero=0.3, temperature=1.5, regularization=0.01, uses_default_metric=True)[source]¶
Riemannian Hamiltonian Auto Encoder config class
- Parameters
latent_dim (int) – The latent dimension used for the latent space. Default: 10
n_lf (int) – The number of leapfrog steps to used in the integrator: Default: 3
eps_lf (int) – The leapfrog stepsize. Default: 1e-3
beta_zero (int) – The tempering factor in the Riemannian Hamiltonian Monte Carlo Sampler. Default: 0.3
temperature (float) – The metric temperature \(T\). Default: 1.5
regularization (float) – The metric regularization factor \(\lambda\)
uses_default_metric (bool) – Whether it uses a custom or default metric architecture. This is updated automatically.
- classmethod from_dict(config_dict)¶
Creates a
BaseConfig
instance from a dictionnary- Parameters
config_dict (dict) – The Python dictionnary containing all the parameters
- Returns
The created instance
- Return type
BaseConfig
- classmethod from_json_file(json_path)¶
Creates a
BaseConfig
instance from a JSON config file- Parameters
json_path (str) – The path to the json file containing all the parameters
- Returns
The created instance
- Return type
BaseConfig
- save_json(dir_path, filename)¶
Saves a
.json
file from the dataclass
- class pyraug.models.rhvae.RHVAESamplerConfig(output_dir=None, batch_size=50, samples_per_save=500, no_cuda=False, mcmc_steps_nbr=100, n_lf=15, eps_lf=0.03, beta_zero=1.0)[source]¶
HMCSampler config class containing the main parameters of the sampler.
- Parameters
num_samples (int) – The number of samples to generate. Default: 1
batch_size (int) – The number of samples per batch. Batching is used to speed up generation and avoid memory overflows. Default: 50
mcmc_steps (int) – The number of MCMC steps to use in the latent space HMC sampler. Default: 100
n_lf (int) – The number of leapfrog to use in the integrator of the HMC sampler. Default: 15
eps_lf (float) – The leapfrog stepsize in the integrator of the HMC sampler. Default: 3e-2
random_start (bool) – Initialization of the latent space sampler. If False, the sampler starts the Markov chain on the metric centroids. If True , a random start is applied. Default: False
- class pyraug.models.rhvae.RHVAESampler(model, sampler_config=None)[source]¶
Hamiltonian Monte Carlo Sampler class. This is an implementation of the Hamiltonian/Hybrid Monte Carlo sampler (https://en.wikipedia.org/wiki/Hamiltonian_Monte_Carlo)
- Parameters
model (RHVAE) – The VAE model to sample from
sampler_config (RHVAESamplerConfig) – A HMCSamplerConfig instance containing the main parameters of the sampler. If None, a pre-defined configuration is used. Default: None
- sample(samples_number)[source]¶
HMC sampling with a RHVAE.
The data is saved in the
output_dir
(folder passed in theBaseSamplerConfig
instance) in a folder namedgeneration_YYYY-MM-DD_hh-mm-ss
. Ifoutput_dir
is None, a folder nameddummy_output_dir
is created in this folder.- Parameters
num_samples (int) – The number of samples to generate