Current Research – Kelvin Leung

I work on algorithms for Earth remote sensing along with collaborators at the NASA Jet Propulsion Laboratory (JPL). Our methods are based on a Bayesian statistical framework and incorporate physics-based models. We have several ongoing and past projects, more details below.

Earth remote sensing

There is a growing interest in analyzing the Earth surface for climate science applications. Due to the infeasibility of taking direct measurements of the entire surface over time, remote sensing has become increasingly popular, where measurements are taken remotely from an airborne or satellite instrument. In my research, we use Remote Visible/Shortwave Infrared (VSWIR) imaging spectroscopy. The quantity of interest is the fraction of light reflected off the Earth surface at given wavelengths, known as the surface reflectances, which are retrieved given observations of radiance. A unique challenge of this problem setup is the high dimensionality of the reflectances and the radiances.

While our remote sensing algorithms can be generalized, we mainly focus on NASA’s Surface Biology and Geology (SBG) mission as the motivating example. The reflectances that we retrieve will be used to analyze the composition and biodiversity of the Earth surface over time. Downstream applications include surface water content, vegetation health, and mineral content.

The process of estimating the surface reflectance from radiance observations is known as a retrieval. This is framed as a Bayesian inverse problem, where we are given a prior on the reflectance parameters, a likelihood model for the radiance observations, and a physics-based forward function that models the radiative transfer from reflectance to radiance. Currently, the state-of-the-art for VSWIR retrievals is an algorithm known as optimal estimation (OE), which uses a Gaussian characterization of the posterior to provide computationally efficient estimates of the reflectances and their associated uncertainties. It is already used operationally in missions such as the Earth surface mineral dust source investigation (EMIT).

Spatio-temporal retrievals

Most retrievals are performed pixel-by-pixel, and do not take into account potentially valuable correlations from data at other pixels or, when the satellite revisits the same region, data collected at different times. We developed a spatio-temporal retrieval algorithm based on optimal estimation by first constructing a graphical model based on our assumptions and deriving the probabilistic representation of the posterior distribution. Then, we showed how this naturally leads to a Kalman filtering-like algorithm that leverages the correlations in the data in space and time.

We applied our algorithm to a 30-by-40 pixel dataset that was collected by the SHIFT flight campaign over Sedgwick Reserve, California. We showed that the retrieval errors can be significantly reduced when we consider the spatio-temporal correlations.

A poster of this work can be found here.

Retrievals using MCMC

Some of the parameters of our problem were found to exhibit non-Gaussian features, motivating the need to improve upon the Gaussian characterizations provided by optimal estimation to better quantify the uncertainty of the retrievals. Markov chain Monte Carlo (MCMC) is often viewed as the workhorse algorithm for Bayesian inference that, while computationally intensive, has been shown to provide good characterizations of posterior distributions. However, naïvely applying MCMC to our retrieval problem is computationally infeasible due to the high dimensionality of the reflectances.

p-values less than 0.05 suggest non-Gaussianity

One way of tackling this challenge is to apply dimension reduction, and we investigated the likelihood-informed subspace (LIS). One can view inference as an update from the prior to the posterior, where the update is informed by the data (likelihood). LIS is based on the idea that the likelihood influences this update more in certain directions of the parameter space than others, and these directions can be found by solving a specific eigenvalue problem. Applying MCMC to the low-dimensional subspace defined by these eigendirections significantly reduces the computational complexity without sacrificing as much accuracy. This work led to my S.M. thesis, which can be found here.

Another way is to find and leverage any problem structure that may be present. For our retrieval problem, we identified a conditional linearity structure based on the nature of our specific forward model. Based on this structure, we devised a block Metropolis MCMC algorithm that achieves a similar reduced runtime to the LIS approach, but with improved convergence properties. A preprint of this work can be found here.

Retrievals using transport

We have explored two algorithms for retrievals that are very different: MCMC and optimal estimation. One area I’m actively working on is developing algorithms that provide a better balance between the computational complexity and performance.

Transport-based inference is a relative new class of methods for Bayesian inference involving maps between a reference distribution and a target distribution. The idea is that if the target distribution is more complex, which in our case is the posterior distribution of reflectances, one can first generate a sample from a simple reference distribution such as a Gaussian, and apply the trained map to that sample to transform it to sample from the target.