Argonauts:Maximum a posteriori
From Wasteland
Contents |
Sun 26 February 2006
Spatio-temporal coordinates
WMVL, 10.30am to 12.15pm (exactly).
Attendees (in alphabetical order)
- Ramón Casero Cañas.
- Niranjan Joshi.
- Mike Kadour.
- Rohan Loveland.
- Olivier Noterdaeme.
- Ingmar Posner.
Minutes
Mike proposed the topic and Ingmar brought us
A. Zisserman. Lecture Notes of Estimation and Inference. Lectures 3 & 4 "Estimators - ML, LS, MAP". Hilary Term 2006.
Prof. Zisserman is in the lab just above us, and he is Principal Researcher of the Visual Geometry Group.
What did we see? In a nutshell, Bayes' Theorem and the Maximum Likelihood (ML) and Maximum A Posteriori (MAP) estimates.
So suppose that we have 2 sensors measuring the same thing (let's call it mu), and that measures are represented as z = [z1, z2]^T. We assume a parametric model for the sensors, e.g. we say that the measure has a normal distribution around the true value that we are trying to measure.
p(z1 | mu) = N(mu, sigma^2) p(z2 | mu) = N(mu, sigma^2)
Now, some of the parameters that define the model are unknown to us. In this example, let's say that it is mu. The likelihood function is then the conditional joint probability function of the sensors
L(mu) = p(z1,z2 | mu)
We only consider independent sensors (or independent measures in general), so that we can do
L(mu) = p(z1 | mu) p(z2 | mu)
The question we are trying to answer now is: Given a model p, what is the value of mu that makes it more likely to have observed z1 and z2? The answer is the ML estimate of mu, often represented as \hat{mu}_{ML}.
To solve this problem, you substitute the formulas of the normal distribution in L(mu), take logs to go from products to sums, derivate, equal to zero and find \hat{mu}_{ML}, i.e. the value of mu that maximizes L(mu).
For normal distributions with the same variance, the solution is \hat{mu}_{ML} = (z1 + z2)/2. If the variance is not the same, then z1, z2 are weighted by the variance values, i.e. you give more importance to the sensor you trust more. Besides, your estimate has less uncertainty than either sensor on its own (this is called additivity of statistical information).
This got us excited in all the wrong ways, because we remembered that it's the same principle behind Kalman Filters, and we have seen that in previous Argonaut meetings.
Is it still possible to improve this estimate? Yes, if we put more information into it. Note that we have looked at maximizing
p(z1,z2 | mu)
that is, what is the probability of observing z given that the true value is mu. In the medical context, this could be phrased as "What's the probability of somebody having a temperature of 38 C given that he's got the flu?".
But we can also look at the posterior density
p(mu | z1,z2)
that is "What's the probability of somebody having the flu, if I measure a temperature of 38 C?". This is where Bayes' Theorem comes into play,
p(mu | z) = p(z | mu) p(mu) / p(z)
This means that we can use prior information on mu (specifically, the prior distribution of mu, p(mu)), to obtain a better estimate of mu. This is called the MAP estimate of mu, and is often represented as \hat{mu}_{MAP}.
Interestingly, prior information adds to the solution in the same way as information from sensors does, and it reduces uncertainty on mu in the same way too.
Now all that's needed is a good estimate of the prior distribution of mu. As far as we know, two ways of doing this in medical images is to construct an atlas or use Markov Random Fields. Cool! Another thing that we have seen in the past! Although to be honest, that was 4 months ago and we didn't get quite a good grasp of the subject.
So it was proposed to try and put together Markov Random Fields and the MAP estimate. We did this in our meeting of the 3 March 2006 Markov Random Fields and Bayes.
After so much thinking the crew scattered in Tortuga and went for some iniquitous fun: Booze at noon, women, men, drugs, rock and roll. (Some went to the Nosebag for lunch too).
--

