Warning: Facebook users, click “View original post” in order to actually see the notation. Also, a pretty technical post.
I need some help. So these days I’m writing some code to calculate a multivariate Gaussian probability density function. It’s a generalization of the standard Normal distribution density function, where instead of a single mean and standard deviation
,
is actually a
-dimensional vector and you have a
covariance matrix
. You compute the probability of observing vector
the following way:
where the is the determinant of
. What do you do when the determinant is zero? You can’t calculate either the prefactor out in front or the inverse matrix in the exponent. This is not an unusual scenario, I don’t think, because sometimes in some of the dimensions of the probability space are 100% correlated with each other – but their mean value is zero (and the standard deviation is 0). In some cases, this will result in a singular
, and just because of a few bad eggs, your formula breaks down.
In my implementation thus far, I’ve worked around this by ignoring the 0 rows in – but is this a valid thing to do? Are there any better solutions? Please let me know!
Posted by Richard Seymour on July 20, 2009 at 10:40 am
From the Wikipedia article on Normal distribution density function:
The covariance matrix is allowed to be singular (in which case the corresponding distribution has no density). This case arises frequently in statistics; for example, in the distribution of the vector of residuals in ordinary linear regression problems. Note also that the Xi are in general not independent; they can be seen as the result of applying the matrix A to a collection of independent Gaussian variables Z.
So in other words if Sigma is 0 so is p. Sounds fair given where that exponent would go as sigma approaches zero.
Posted by cipher3d on July 20, 2009 at 3:45 pm
That’s not completely satisfactory, just as like saying, the probability of hitting 0.5 in a normal distribution of mean 0.5 and variance 0 is zero – one would think that the probability should be 1, but the math is indeterminate.
The underlying problem is that when you calculate covariance
between two variables which happen to match identically, but their mean is 0,
degenerates to 0, not 1.
So I think by ignoring the rows/cols in \Sigma where this happens, you’re ignoring the variables that you have 100% confidence in….
Oh, I forgot to mention that, as an approximation, I made
a diagonal matrix (that is, I only calculate the covariance between variable
and itself). That probably contributes a lot to
becoming singular.