Diary of a Mad Scientist

Sunday, October 17, 2010

Entropy Filter v4

It occurred to me several weeks ago that my ad-hoc entropy filter would better mirror the entropy equations if I too the log of the sum instead of the sum of logs. I made a Intensity/Entropy scattergram using this modification.

The result is very similar to the previous one, but there seems to be an interesting feature in the new plot.

What is that line near the top of the scattergram?

I really need create a way to map areas of these plots back to pixels.

The Problem with a Fair Coin

It has been a while since I have posted. Last time, I was working on Elements of Information Theory before I had to return it to the library.

I didn't get very far (and it has been nearly a month since I returned it).

My main approach with technical book is to implement solutions to the end of chapter problems in software. The first question in this book is as follows:

A fair coin is flipped until the first head occurs. Let X denote the number of flips required. Find the entropy H(X) in bits.

This is a fairly easy problem to solve (especially with the series summation provided with the question). However, the solution has a very high barrier in software.

To write a general software solution, I would need the following:
1) Representation of sums of infinite series.
2) Ability to convert a random variable description into the infinite series.
3) Ability to manipulate the infinite series until it matches a function with a known closed form solution.

Once I've gotten to that point, the use of the closed form solution isn't very useful.

I tend to get stuck chasing rabbit holes like this. I didn't make much progress after I got stuck in this tar pit.

The book is back at the library now and I'm going to head back to my vision work.

Sunday, September 12, 2010

Entropy of Joint Distribution

Reading Elements of Information Theory.

With respect to my reseach, I am only accountable to myself. I tend to flit between topics as the mood or opportunity strikes me. A few weeks ago, the library got a copy of the book linked above. I've been wanting to read this for years and checked it out and started reading it.

Chapter 1 was a high level overview of the book.

Chapter 2 covers entropy, joint entropy, conditional entropy, realtive entropy, mutual information, and conditional relative entropy. It includes a bunch of theorems (and chain rules) related to these values. It also includes relationships between these values that can be used to determine properties and bounds of the distribution. There is also a section that talks about the number of samples you need to accurately represent a distribution. This is something that I would have found quite useful years ago (Dieter Fox uses KLD-Sampling to do this sort of thing, but I never quite understood KLD-Sampling).

The chapter focuses on discrete distributions, but then mixes that with continuous numerical distributions when it talks about convex and concave functions (if you have a distribution of unordered colors, can it be concave or convex?)

I've read through the entire chapter, but there is a lot to digest. I also do not have practical applications for many of the relationships which make it difficult for me to learn the material well.

As always, I'm writing code based on the book. I do not know what I'm going to do for the relationships.

The main structure I am using so far is a discrete joint distribution P(X, Y). This is simply a 2D array for floating point values. I sum the values to use as a normalizing quotient.

P(X,Y) =	P(X₀,Y₀)	P(X₁,Y₀)	P(X₂,Y₀)
	P(X₀,Y₁)	P(X₁,Y₁)	P(X₂,Y₁)
	P(X₀,Y₂)	P(X₁,Y₂)	P(X₂,Y₂)
	P(X₀,Y₃)	P(X₁,Y₃)	P(X₂,Y₃)

The Joint Entropy of P(X,Y) (labeled as H(X,Y)) is the same as the Entropy of P(U) = { P(X₀,Y₀), P(X₁,Y₀), P(X₂,Y₀), P(X₀,Y₁) ... P(X₂,Y₃)}. H(X,Y) = H(U).

I think that this implies that any distribution P(U) can refactored into a joint distribution P(X,Y). Maximizing or minimizing some entropy calculation might reveal a useful hidden variable, either as independent components or a a smaller domain that is highly predictive of the other domain.

Chapter 2 talk a lot about the Conditional Entropy H(Y|X) of P(X,Y). Recall that P(X,Y) = P(Y|X) * P(X). In information space, this becomes H(X,Y) = H(Y|X) + H(X). So H(Y|X) = H(X,Y) - H(X).

The conditional entropy is the information in the joint distribution minus the information in one component of the joint distribution.

In the definition of P(X,Y) above, we can calculate P(X) by summing the columns (and P(Y) by summing the rows.)

sum = 19	P(Y)	P(X,Y)
P(X)		4	6	9
P(X,Y)	7	1	2	4
	5	1	2	2
	4	1	1	2
	3	1	1	1

P(Y|X) = H(X,Y) - H(X) = P(X,Y) - (-4/19*Log(4/19) - 6/19*Log(6/19) - 9/19*Log(9/19))

That gets me up to section 2.3. I'll write more as I slowly grind my way through.

Sunday, August 29, 2010

Intensity/Entropy Scatterplot

Reading Chapter 4 of Machine Vision.

Below is a result from my implementation of the scatter plot described in section 4.3.2. My efforts in writing an entropy filter came out of the description of this graph. Davies does not mention an entropy filter explicitly and doesn't describe how one might calculate "intensity variation". I'm not certain that my graph is generating the same thing that he describes.

The X axis the the intensity value and the Y axis is the entropy value. The intensity and entropy are found for each pixel and used to increment a cell in the scatterplot.

Davies talks about using this to determine thresholding values, but I haven't studied what he wrote very closely yet. I got too excited thinking about how I might implement the entropy filter.

Things I want to do with this:

Allow the user to select a point and have the software automatically find a Gaussian that describes the data around that point.
Use a SVM to find linear separations of the clusters.
Map the groupings defined by the SVM or Gaussian mixtures back to the image with each category getting a unique color. I'm curious if the groupings correspond to image features or distinctive areas.

See previous posts for the original image.

Sharp/Unsharp

This took me about 30 minutes. I use a 5x5 Gaussian filter with a standard deviation of 1.5.

The Gaussian filter removes the high frequency components of the image. When we subtract this from the original image, we are removing some of the low frequency components.

Compare the original with the sharpened image:

Entropy Filter v3

I keep making mistakes.

Let Z be the set of sampled points and Z_i be a specific point in Z.
Let N be the number of values in an interval
N = abs(Z_i - Z_j) + 1
(The one is added to avoid an infinite PDF for identical sample values)
Let p be the probability of a value in an interval.
p = 1/N
The entropy E_ij of an interval is then.

E = - N * p * log2(p)
E = - N * (1/N) * log2(1/N)
E = - log2(1/N)
E = log2(N)

The distribution of values seems to be much more uniform in the resulting distiribution. I can see no difference between the normalization scaling and unique value rank methods I decribed in my previous post.

Entropy Filter v2

I think I've found an acceptable approximation to an entropy filter. It is a bit fuzzy in its rigor, but it produces results that look good to me.

My goals was that a sample set like (1, 1, 1, 1, 1, 1, 1, 1, 1) would have a value of zero and (0, 32, 64, 96, 128, 160, 192, 224, 256) would have a maximal value.

I treat each interval between each pair of samples as "part" of the distibution. All of the intervals are defined to have the same probability p. Each value in an interval then has a probability p/(interval length). For convenience we let p=1 (scaling the distribution adds a constant value to each entropy calculation, but the relative entropy values should remain constant).

The entropy of an internal is then:
E_ij = - Length_ij * Log2(1/Length_ij)
E_ij = Length_ij * Log2(Length_ij)

For each ordered sample pair (i,j).

This results in values ranging from 0 to around 30k. We can scale these values to [0..1) which can then easily be converted to [0..255]. However, this conversion seems to be result in an entropy map that consists largely of only black and white pixels. I'm not happy with the loss of detail when converting to Gray8 format.

I've played with other approaches to normalize the entropy results. The following image shows the results of using the relative position in a sorted list of values.

This shows that small values are high represented in the final distribution. Areas that were black in the first picture are now full of static. The bright area also seem to mix together too much. I don't think this is a very useful metric either.

To discount the large number of small values, I find unique values (when rounded to the nearest integer) in the image. These unique values are placed into a list and the relative position of a value in this unique list is used to scale the image.

This looks much like the scale image, but is a bit brighter and seems to have more dynamic range. I can see the static in the dark ares, but it doesn't dominate the image like the second version. I've completely lost the relative entropy values, but since my technique is quite ad-hoc, I have little confidence in the relative values' usefulness anyway.

At this point, I think I am ready to move on to new things.