QuickStart

As a quick start, we use the linear model as an example to demonstrate how to run Bayesian MCMC simulations with AlTar. :

  1. Prepare a configuration file, e.g., linear.pfg, to specify various parameters and settings;

  2. Prepare input data files required for the model, e.g., for the linear model, the observed data, the data covariance and the Green’s function;

  3. Run a dedicated AlTar application, e.g., linear for the linear model;

  4. Collect and analyze the simulation results.

The linear model example demonstrated here comes with the AlTar source package, under the directory models/linear/examples. It is also available as a jupyter notebook in Tutorials:linear.

Prepare the configuration file

A configuration file is used to pass various settings to an AlTar application. Here is an example for the linear model,

linear.pfg
 1;
 2; michael a.g. aïvázis
 3; orthologue
 4; (c) 1998-2020 all rights reserved
 5;
 6
 7; the application
 8linear:
 9    ; the model
10    model = linear
11    ; the linear model configurations
12    model:
13        ; the directory for input files
14        case = patch-9
15        ; the number of parameters
16        parameters = 18
17
18        ; the number of observations
19        observations = 108
20        ; the data observations file
21        data = data.txt
22        ; the data covariance file
23        cd = cd.txt
24
25        ; prior distribution for parameters
26        ; prior is used to calculate the prior probability
27        ;    and check ranges during the simulation
28        prior = gaussian
29        ; prior configurations
30        prior:
31            parameters = {linear.model.parameters}
32            center = 0.0
33            sigma = 0.5
34        ; prep is used to initialize the samples in the beginning of the simulation
35        ; it can be different from prior
36        prep = uniform
37        prep:
38            parameters = {linear.model.parameters}
39            support = (-0.5, 0.5)
40
41    ; controller/annealer, use the default CATMIP annealer
42    controller:
43        ; archiver, use the default HDF5 achiver
44        archiver:
45            output_dir = results ; results output directory
46            output_freq = 1 ; output frequency in annealing beta steps
47
48    ; run configuration
49    job.tasks = 1 ; number of tasks per host
50    job.gpus = 0  ; number of gpus per task
51    job.chains = 2**10 ; number of chains per task
52
53; end of file

The .pfg (pyre config) files follow a human-readable data-serialization format similar to YAML, where the data-structure hierarchy is maintained by whitespace indentation (or by full/partial paths, such as job.tasks, see Pyre Config Format (.pfg) for more detailed instructions).

The name of the AlTar application (instance), linear, is set as the root. Configurable components of an AlTar application include

  • model, for model specific configurations, such as the prior distributions of the model parameters, parameters in the forward model, and the data observations;

  • job, which configures the size of the simulation, and how the job will be deployed, e.g., single or multiple threads, single machine or multi-node cluster, CPU or GPU;

  • controller, for configurations to control the Bayesian MCMC procedure.

Note

If a component is not specified in the configuration, its default value/implementation will be used instead.

Model configurations vary depending on its own forward problem: model-specific instructions are provided in the respective sections of this Guide. Instructions for the main framework, such as job and controller, can be found in the AlTar Framework section.

Prepare input files

While simple configurations can be specified in the configuration file, large sets of data are passed to the AlTar application by data files. Different model may require different categories of data, in different input format.

For the linear model, the data likelihood is computed as

\[\begin{split}P({\bf d}|{\boldsymbol \theta}) &= \frac {1} {\sqrt{(2\pi)^m \text{det}(C_d)}} \\ &\times \exp\left[- \frac {1}{2} \left({\bf d} - {\bf d}^{pred} \right)^T C_d^{-1} \left({\bf d} - {\bf d}^{pred} \right)\right]\end{split}\]

where \({\boldsymbol \theta}\) is a vector with \(n\) unknown model parameters, \({\bf d}\) a vector with \(m\) observations, the covariance \(C_d\) a \(m \times m\) matrix representing the data uncertainties. The data prediction \({\bf d}^{pred}\) is given by the forward model

\[{\bf d}^{pred} = \mathcal{G} {\boldsymbol \theta}.\]

where the Green’s function \(\mathcal{G}\), is a \(n\times m\) matrix.

The computation requires three users’ input, \({\bf d}\), \(C_d\), and \(\mathcal{G}\), as plain text files data.txt, cd.txt and green.txt. The location of the files can be specified by the linear.model.case parameter in the configuration file, while the file names can be specified by linear.model.data, linear.model.cd and linear.model.green.

Run an AlTar application

For each model, we have provided a dedicated command for running AlTar simulations (in fact, you can run altar for all models, but it may require some changes to the configuration file). The dedicated command for the linear model is linear, which is a Python script as shown below

 1#!/usr/bin/env python3
 2# -*- coding: utf-8 -*-
 3#
 4# michael a.g. aïvázis (michael.aivazis@para-sim.com)
 5#
 6# (c) 2010-2020 california institute of technology
 7# (c) 2013-2020 parasim inc
 8# all rights reserved
 9#
10
11# get the package
12import altar
13
14# make a specialized app that uses this model by default
15class Linear(altar.shells.application, family='altar.applications.linear'):
16    """
17    A specialized AlTar application that exercises the Linear model
18    """
19
20    # user configurable state
21    model = altar.models.model(default='linear')
22    model.doc = "the AlTar model to sample"
23
24
25# bootstrap
26if __name__ == "__main__":
27    # build an instance of the default app
28    app = Linear(name="linear")
29    # invoke the main entry point
30    status = app.run()
31    # share
32    raise SystemExit(status)
33
34
35# end of file

It defines a Linear application class and provides a main entry point for execution. The linear can be run as any other shell commands, but you do need to run it at the directory where the linear.pfg and the case (patch-9) directory are located,

linear

and the simulation begins.

If you would like to use a different script file other than linear.pfg

linear --config=linear2.pfg

or if you would like to pass/change a parameter from command lines, e.g., to increase the number of Markov chains

linear --job.chains=2**10

More run options will be explained in the AlTar Framework section.

Collect and analyze results

AlTar offers several options how to output the simulation results. The default is an HDF5 archiver, which outputs the simulation results from each \(\beta\)-step to HDF5 files located at results directory. Data in these HDF5 files, named as step_nnn.h5, can be viewed by a HDF Viewer, such as HDFView, HDFCompass.

For each step_nnn.h5, the following structures are used

+---------- step_nnn.h5 ------
├── Annealer ; annealing data
|   ├── beta ; the beta value
|   └── covariance ; the covariance matrix for Gaussian proposal
├── Bayesian ; the Bayesian probabilities
|   ├── prior ; prior probability for all samples, vector (samples)
|   ├── likelihood ; data likelihood for all samples
|   └── posterior ; posterior probability for all samples
└── ParameterSets ;  samples
    └── theta ; samples of model parameters, 2d array (samples, parameters)
import h5py
import numpy

You may draw a histogram of the posterior to check its distribution. Since the log values of the probabilities are used and saved, the distribution will normally show a lognormal form. You may also do some statistics on the samples, for example, mean and standard deviations. If the posterior assumes a Gaussian distribution, the mean model provides an estimated solution to the linear inverse problem. Some of the data analysis programs are also included with AlTar.

AlTar is a software package developed to perform Bayesian inference to inverse problems with the Markov Chain Monte-Carlo methods. It consists of

  • a main framework which performs the Bayesian MCMC, and controls the job deployment;

  • a model which performs the forward modeling and feeds the data likelihood results to the Bayesian framework. Model implementations for various inverse problems are included. Users may add new models by

An AlTar application integrates a model with the main framework and serves as the main program to run simulations.