Bayesian Inference for Inverse Problems
Inverse problem
An inverse problem in science is to infer a set of unknown parameters, \({\boldsymbol \theta} = \{ \theta_1, \theta_2, \ldots, \theta_m \}\), from data observations \({\bf d}=\{d_1, d_2, \ldots, d_n\}\). For example, in Seismology, we deduce the earthquake rupture information (parameterized as \({\boldsymbol \theta}\)) from ground motions (\({\bf d}\)) measured by seismometers, GPS, InSAR and other geodetic surveys. In most cases, the forward problem \({\bf d} = G({\boldsymbol \theta})\) is well defined but the inverse \({\boldsymbol \theta} = G^{-1}({\bf d})\) is not. This happens to linear systems \(G({\boldsymbol \theta})= {\bf G} {\boldsymbol \theta}\) when the observation matrix \({\bf G}\) is ill-posed, and nonlinear systems where the inverse is inherently difficult.
Bayesian approach
Bayesian inference offers a statistical solution to inverse problems by modeling unknown parameters \({\boldsymbol \theta}\) as random variables subject to a conditional probability \(P({\boldsymbol \theta}|{\bf d})\). According to Bayes’ theorem, the probability is given by
Since \(P({\bf d})\) is the same for all possible \({\boldsymbol \theta}\) and we treat it as a constant 1. The likelihood \(P({\bf d}|{\boldsymbol \theta})\) can be determined from the forward modeling, for example, assuming a form of Gaussian probability density,
where the covariance matrix \(C_{\chi}\) captures noises/errors in observations \({\bf d}\), as well as uncertainties in the forward function \(G({\boldsymbol \theta})\).
The set of model parameters \({\boldsymbol \theta}\) with the maximum posterior probability (MAP), or the mean or median model, provides an estimate solution to the inverse problem. Compared to other point estimators, the Bayesian approach has several advantages. By fully evaluating the posterior probability densities, the Bayesian approach can also address situations when several sets of \({\boldsymbol \theta}\) have equal or comparable probabilities, or the posterior distribution is not unimodal. In addition, the Bayesian approach accommodates uncertainty quantification, by including various types of uncertainties in the calculation.
The Bayesian approach is extremely intensive computationally, since it requires repeating the forward modeling for all possible \({\boldsymbol \theta}\). With the improved sampling algorithms and computing powers, e.g., GPU, it now becomes feasible for many inverse problems with high dimensional parameter space and/or complex forward models.
CATPMIP algorithm
The posterior distribution \(P({\boldsymbol \theta}|{\bf d})\) can be determined, for example, by evaluating \(P({\boldsymbol \theta}) P({\bf d}|{\boldsymbol \theta})\) for all possible \({\boldsymbol \theta}\). Instead of uniformly sampling over the entire \({\bf \theta}\)-space, we rely on Markov-Chain Monte-Carlo (MCMC) methods, which draw efficiently weighted samples pursuant to a prescribed probability distribution.
The CATMIP (Cascading Adaptive Transitional Metropolis in Parallel) algorithm belongs to a class of MCMC methods which use transitioning: samples are initially drawn from the prior distribution and subsequently annealed to the posterior through a series of transient distributions,
where \(\beta_m\) (in analogy to the inverse of temperature in statistical physics) is gradually increased from \(\beta_0=0\) to \(\beta_M=1\) in \(M\) steps.
The procedure of CATMIP is described as follows:
For the \(m=0\) \(\beta\)-step, with \(\beta_0=0\), generate \(N_s\) random samples from the distribution \(P_0 ({\boldsymbol \theta}|{\bf d}) = P({\boldsymbol \theta})\), as seeds of \(N_s\) parallel chains.
Determine \(\beta_{m+1}\) for the next \(\beta\)-step, e.g., from the statistics of the samples from the \(\beta_m\)-step. CATMIP uses an annealing schedule based on the coefficient of variance (COV) of the importance weights \(w_i = P({\bf d}|{\boldsymbol \theta}_i)^{\beta_{m+1}-\beta_m}\). The targeted COV is typically set as 1, or the effective sample size (ESS) equals to 50%.
Perform a resampling of samples based on their importance weights \(\{w_i\}\): some samples with small weights may be discarded while some samples with large weights are duplicated. The total number of samples is kept the same, as seeds for \(N_s\) chains in the \(\beta_{m+1}\)-step.
With the new \(\beta_{m+1}\), a burn-in MCMC process is preformed with the Metropolis-Hastings algorithm. New samples are proposed from the distribution \(P_{m} ({\boldsymbol \theta}|{\bf d})\), assuming a multivariate Gaussian form, and accepted/rejected subject to the probability density \(P_{m+1} ({\boldsymbol \theta}|{\bf d})\). The acceptance ratio is also recorded, which is used to scale the jump distances for proposals in the next \(\beta\)-step.
Repeat Steps 2-4 until \(\beta_{M}=1\) is reached, or the desired equilibrium distribution \(P ({\boldsymbol \theta}|{\bf d})\) is achieved.
Step 4 presents the majority of the computational load. Since each chain is updated independently, the computation is embarrassingly parallel, which makes CATMIP an ideal algorithm to be implemented on parallel computers, including GPUs.
References
Albert Tarantola, Inverse Problem Theory and Methods for Model Parameter Estimation, SIAM, 2005. ISBN: 978-0-89871-572-9.
Sarah E. Minson, Mark Simons, and James L. Beck, Bayesian inversion for finite fault earthquake source models I — theory and algorithm, Geophysical Journal International, Vol. 194, 1701 (2013).