Diary of a Mad Scientist: 2010

Sunday, October 17, 2010

Entropy Filter v4

It occurred to me several weeks ago that my ad-hoc entropy filter would better mirror the entropy equations if I too the log of the sum instead of the sum of logs. I made a Intensity/Entropy scattergram using this modification.

The result is very similar to the previous one, but there seems to be an interesting feature in the new plot.

What is that line near the top of the scattergram?

I really need create a way to map areas of these plots back to pixels.

The Problem with a Fair Coin

It has been a while since I have posted. Last time, I was working on Elements of Information Theory before I had to return it to the library.

I didn't get very far (and it has been nearly a month since I returned it).

My main approach with technical book is to implement solutions to the end of chapter problems in software. The first question in this book is as follows:

A fair coin is flipped until the first head occurs. Let X denote the number of flips required. Find the entropy H(X) in bits.

This is a fairly easy problem to solve (especially with the series summation provided with the question). However, the solution has a very high barrier in software.

To write a general software solution, I would need the following:
1) Representation of sums of infinite series.
2) Ability to convert a random variable description into the infinite series.
3) Ability to manipulate the infinite series until it matches a function with a known closed form solution.

Once I've gotten to that point, the use of the closed form solution isn't very useful.

I tend to get stuck chasing rabbit holes like this. I didn't make much progress after I got stuck in this tar pit.

The book is back at the library now and I'm going to head back to my vision work.

Sunday, September 12, 2010

Entropy of Joint Distribution

Reading Elements of Information Theory.

With respect to my reseach, I am only accountable to myself. I tend to flit between topics as the mood or opportunity strikes me. A few weeks ago, the library got a copy of the book linked above. I've been wanting to read this for years and checked it out and started reading it.

Chapter 1 was a high level overview of the book.

Chapter 2 covers entropy, joint entropy, conditional entropy, realtive entropy, mutual information, and conditional relative entropy. It includes a bunch of theorems (and chain rules) related to these values. It also includes relationships between these values that can be used to determine properties and bounds of the distribution. There is also a section that talks about the number of samples you need to accurately represent a distribution. This is something that I would have found quite useful years ago (Dieter Fox uses KLD-Sampling to do this sort of thing, but I never quite understood KLD-Sampling).

The chapter focuses on discrete distributions, but then mixes that with continuous numerical distributions when it talks about convex and concave functions (if you have a distribution of unordered colors, can it be concave or convex?)

I've read through the entire chapter, but there is a lot to digest. I also do not have practical applications for many of the relationships which make it difficult for me to learn the material well.

As always, I'm writing code based on the book. I do not know what I'm going to do for the relationships.

The main structure I am using so far is a discrete joint distribution P(X, Y). This is simply a 2D array for floating point values. I sum the values to use as a normalizing quotient.

P(X,Y) =	P(X₀,Y₀)	P(X₁,Y₀)	P(X₂,Y₀)
	P(X₀,Y₁)	P(X₁,Y₁)	P(X₂,Y₁)
	P(X₀,Y₂)	P(X₁,Y₂)	P(X₂,Y₂)
	P(X₀,Y₃)	P(X₁,Y₃)	P(X₂,Y₃)

The Joint Entropy of P(X,Y) (labeled as H(X,Y)) is the same as the Entropy of P(U) = { P(X₀,Y₀), P(X₁,Y₀), P(X₂,Y₀), P(X₀,Y₁) ... P(X₂,Y₃)}. H(X,Y) = H(U).

I think that this implies that any distribution P(U) can refactored into a joint distribution P(X,Y). Maximizing or minimizing some entropy calculation might reveal a useful hidden variable, either as independent components or a a smaller domain that is highly predictive of the other domain.

Chapter 2 talk a lot about the Conditional Entropy H(Y|X) of P(X,Y). Recall that P(X,Y) = P(Y|X) * P(X). In information space, this becomes H(X,Y) = H(Y|X) + H(X). So H(Y|X) = H(X,Y) - H(X).

The conditional entropy is the information in the joint distribution minus the information in one component of the joint distribution.

In the definition of P(X,Y) above, we can calculate P(X) by summing the columns (and P(Y) by summing the rows.)

sum = 19	P(Y)	P(X,Y)
P(X)		4	6	9
P(X,Y)	7	1	2	4
	5	1	2	2
	4	1	1	2
	3	1	1	1

P(Y|X) = H(X,Y) - H(X) = P(X,Y) - (-4/19*Log(4/19) - 6/19*Log(6/19) - 9/19*Log(9/19))

That gets me up to section 2.3. I'll write more as I slowly grind my way through.

Sunday, August 29, 2010

Intensity/Entropy Scatterplot

Reading Chapter 4 of Machine Vision.

Below is a result from my implementation of the scatter plot described in section 4.3.2. My efforts in writing an entropy filter came out of the description of this graph. Davies does not mention an entropy filter explicitly and doesn't describe how one might calculate "intensity variation". I'm not certain that my graph is generating the same thing that he describes.

The X axis the the intensity value and the Y axis is the entropy value. The intensity and entropy are found for each pixel and used to increment a cell in the scatterplot.

Davies talks about using this to determine thresholding values, but I haven't studied what he wrote very closely yet. I got too excited thinking about how I might implement the entropy filter.

Things I want to do with this:

Allow the user to select a point and have the software automatically find a Gaussian that describes the data around that point.
Use a SVM to find linear separations of the clusters.
Map the groupings defined by the SVM or Gaussian mixtures back to the image with each category getting a unique color. I'm curious if the groupings correspond to image features or distinctive areas.

See previous posts for the original image.

Sharp/Unsharp

This took me about 30 minutes. I use a 5x5 Gaussian filter with a standard deviation of 1.5.

The Gaussian filter removes the high frequency components of the image. When we subtract this from the original image, we are removing some of the low frequency components.

Compare the original with the sharpened image:

Entropy Filter v3

I keep making mistakes.

Let Z be the set of sampled points and Z_i be a specific point in Z.
Let N be the number of values in an interval
N = abs(Z_i - Z_j) + 1
(The one is added to avoid an infinite PDF for identical sample values)
Let p be the probability of a value in an interval.
p = 1/N
The entropy E_ij of an interval is then.

E = - N * p * log2(p)
E = - N * (1/N) * log2(1/N)
E = - log2(1/N)
E = log2(N)

The distribution of values seems to be much more uniform in the resulting distiribution. I can see no difference between the normalization scaling and unique value rank methods I decribed in my previous post.

Entropy Filter v2

I think I've found an acceptable approximation to an entropy filter. It is a bit fuzzy in its rigor, but it produces results that look good to me.

My goals was that a sample set like (1, 1, 1, 1, 1, 1, 1, 1, 1) would have a value of zero and (0, 32, 64, 96, 128, 160, 192, 224, 256) would have a maximal value.

I treat each interval between each pair of samples as "part" of the distibution. All of the intervals are defined to have the same probability p. Each value in an interval then has a probability p/(interval length). For convenience we let p=1 (scaling the distribution adds a constant value to each entropy calculation, but the relative entropy values should remain constant).

The entropy of an internal is then:
E_ij = - Length_ij * Log2(1/Length_ij)
E_ij = Length_ij * Log2(Length_ij)

For each ordered sample pair (i,j).

This results in values ranging from 0 to around 30k. We can scale these values to [0..1) which can then easily be converted to [0..255]. However, this conversion seems to be result in an entropy map that consists largely of only black and white pixels. I'm not happy with the loss of detail when converting to Gray8 format.

I've played with other approaches to normalize the entropy results. The following image shows the results of using the relative position in a sorted list of values.

This shows that small values are high represented in the final distribution. Areas that were black in the first picture are now full of static. The bright area also seem to mix together too much. I don't think this is a very useful metric either.

To discount the large number of small values, I find unique values (when rounded to the nearest integer) in the image. These unique values are placed into a list and the relative position of a value in this unique list is used to scale the image.

This looks much like the scale image, but is a bit brighter and seems to have more dynamic range. I can see the static in the dark ares, but it doesn't dominate the image like the second version. I've completely lost the relative entropy values, but since my technique is quite ad-hoc, I have little confidence in the relative values' usefulness anyway.

At this point, I think I am ready to move on to new things.

Friday, August 27, 2010

Entropy Filter v1

I have implemented an entropy filter.

I used the same distribution model that I had used for the mode filter. I do not think this is the appropriate model to use and I may want to revisit my mode filter as well.

I sort the pixel values in the 3x3 neighborhood around a pixel. Each pixel value recieves a weight of 1/9. Repeated values get the sum of the repeated observations. The observations divide the domain into a set of spans with the observations at the endpoints of the spans (plus two potential spans at the ends of the domain.) I let the probability over a span be equal to the average weight of the endpoints of the range.The end points of the domain default to zero unless their is an observation at those points. The probability density is the the probability divided by the width of the span.

The problem with this approach becomes obvious if we have a region that is all 1's on a domain of [0, 255]. The span to the left of the observations has a density of 1/1 while the span to the right of the observation has a density of 1/254.

Calculating the entropy for this, we get
E = Sum over i=[0..255] ( - p_i * log2(p_i))
E = - (0.5 * 1/255) * log2(0.5) - 254 * (0.5/254) * log2(0.5/254)
E = - (1/510) * log2(0.5) - (0.5) * log2(1/508)
E = 0.002 + 4.5
E = 4.5

for a distribution that should probably have zero entropy.

The result of this entropy filter has a lot of noise. The edges stand out beautifully, but they stand out with low entropy instead of high entropy.

I think it is back to the drawing board. I am considering two options:
1) Redefine the spans to be the points closest to a unique observation.
2) Convolve the observation impulses with a Gaussian.

The second approach requires tuning of the blurring Gaussian. I do not have any idea how one might determine an optimal Gaussian width through rigorous means.

Frequency Domain Plans

Reading Machine Vision.

Finished Chapter 3. I skimmed over the sections on shifts introduced by median filters. I don't have any vision applications yet, and until I have an application that requires shift correction I don't think I will implement those corrections.

I also didn't implement the Sharp-Unsharp Masking filter. Initially this is because my test harness only has a source image and a destination image and Sharp-Unsharp requires three images. However I may still implement it by creating an internal tempory image that gets a Gaussian blurred image.

I am a bit disappointed that the book did not include an explaination of some of these filters (and especially Sharp-Unsharp) in the frequency domain. I have a fairly good understanding of 1D signals in the frequency domain, but I'm not sure how that extends out to 2D (unless you treat the image as a series of 1D signals). I am putting "Frequency Domain Tutorial" on my TODO list for this blog. right now. I might as well add Sharp-Unsharp to the list at the same time.

I didn't find the Color in Imaging Filtering very useful either. I would like to see a deeper theoretical discussion of color. I am skeptical that the L2 norm of a color vector (or delta between two color vectors) is the right function to use for intensity. I imagine that a power function that takes into account the frequencies of the color might give a more useful result. We might also want to weight the colors by typical response sensitivity in the human eye.

I've started on Chapter 4. The scatter plot of gradient vs intensity is very interesting to me. I am planning to write an entropy filter based on this idea. It will treat the pixels in the neighborhood of a target as a distribution in the same way that the mode filter does. I can then calculate the entropy of this distribution and map that to a grayscale value. I expect I will get nice gradients that converge to lines. I can then use hill climbing to find the lines. I might be able to use this in video to hillclimb from a feature location in a previous frame to the new location in the next frame. I'm adding entropy filter to my TODO list too. That is the first thing I'm going to do.

Saturday, August 21, 2010

A Mode filter in C#

Machine Vision does not include code for a Mode filter, so I am publishing mine here. It is written in C#. I leaves out some support infrastructure details, but those should be fairly trivial to adapt to another library.

The basic idea here is to treat the neighborhood of pixels around a point as a probability distribution and select the most likely point. This will be the place with the highest density. With a 3x3 neighbor, we only have nine points to sample this distribution but my results turned out surprising well.

First the points are sorted in an array. The array is traversed and the distance between values is examined. The closer two values are, the higher their density. I look for intervals that have two larger intervals on each side. The midpoint of that interval is used as the mode.

If there is more than one interval that meets that criteria, I select the interval whose mid-point is closest to the distribution median.

public static byte Mode(ImageGray8 img, int x, int y) {
    Accessor a = new Accessor(img, x, y);
    byte[] p = new byte[] {
        a.P4, a.P3, a.P2,
        a.P5, a.P0, a.P1,
        a.P6, a.P7, a.P8
        };
    Array.Sort(p);
    byte median = p[4];
    int modeIndex = -1;
    int dp = int.MaxValue;
    for (int i = 0; i < p.Length-1; i++) {
        int dp_i = p[i+1] - p[i];
        if (dp_i < dp) {
            // still increasing to the local maximum density
            dp = dp_i;
        } else {
            // just passed a local maximum
            if (modeIndex < 0) {
                modeIndex = i;
            } else {
                // select the previous interval 
                // which is closest to the median
                byte modeCurrent = (byte) 
                    ((p[modeIndex - 1] + p[modeIndex]) / 2);
                byte modeCandidate = (byte) 
                    ((p[i - 1] + p[i]) / 2);
                if (Math.Abs(median - modeCurrent) > 
                    Math.Abs(median - modeCandidate)
                    ) {
                    modeIndex = i;
                }
            }
        }
    }

    byte modeValue;
    if (modeIndex < 0) {
        // density increased with every step
        modeValue = p[8];
    } else if (modeIndex == 0) {
        // density decreased with every step
        modeValue = p[0];
    } else {
        modeValue = (byte) 
            ((p[modeIndex - 1] + p[modeIndex]) / 2);
    }

    return modeValue;
}

How does this filter look? Detail is flattened into smooth shading, while edges are enhanced (with anti-aliased edges turning to jaggies). I think it looks a bit cartoon like.

The original "Lena" Playboy image commonly used in graphics (click through to see the full size images).

And the image processed multiple times by the mode filter.

Wednesday, August 18, 2010

Gaussian Image Filter

I wrote a very simple program to generate 3x3 and 5x5 Gaussian filters for images. I sample points from the pixel to find an average value.

Let the Gaussian function be N(mean, sigma).

For N(0,1), I got the following matrix:

          1 3 1
N(0,1) = [3 5 3] / 21
          1 3 1

Machine Vision lists the following matrix:

          1 2 1 
N(0,?) = [2 4 2] / 16
          1 2 1

I suppose that is pretty close to a Gaussian, but I wonder how much it affects the frequency response of the filter.

It is pretty obvious that those values were chosen for performance. Dividing by 16 requires only a bit shift.

Tuesday, August 17, 2010

Scanning DNNF Paper

I read Decomposable Negation Normal Form on the exercise bike at the gym today. I'm going to have to spend more time on the math to regain the understanding I had of it in 2003. I had forgotten that the "smoothness" property was an additional constraint on DNNF and not part of the core definition.

I don't know when I'll get back to this paper. I don't have a particular use for it right now. I'm much more interested in exploring frequency domain of images.

Starting Chapter 3 of Machine Vision

Starting chapter 3 of Machine Vision, Third Edition: Theory, Algorithms, Practicalities (Signal Processing and its Applications).

Currently implemented image operations: Load, Set, Copy, Brightness, Stretch, Blur, Invert, Threshold, Shrink, Expand, Edge, Salt, Pepper, Noise, Noize.

I implemented Average, but my test program doesn't support enough images to actually test it.

In section 2.6, Davies talks about his time weighted averaging technique:

"Although the learning routine used here has not been found in the literature, it has been used in the author's laboratory, and was probably developed independently elsewhere."

I think this algorithm can be expressed as a set of 1D Kalman filter estimating the grayscale value of each pixel. The effect would be the same, although the scaling factors would be different. If the image is captured from a CCD, each observation value could be given an appropriate observation error. I'm not sure what the action model used to increase uncertainty over time should be. I suspect it would be a tuned parameter.

Sunday, August 15, 2010

Generalizing Image Manipulator Commands

At the Crossroads library while The Wife and the Daughter shop. Working until we see Despicable Me.
Changing commands design in my Image Manipulator. Commands will not be statically bound to specific image instance. Instead the command will apply to the current image and copy the result to the a different available image. Commands applied to an incorrect image type are currently ignored. Maybe I'll filter the command list in the future.
...
Crossroads is closing in 15 minutes. Converted Load, Set, Copy, Brightness, Stretch, Invert, Threshold, and Shrink.