The Mandelbrot Set

The Mandelbrot set is the "King" of all fractals. It describes all Julia Sets, has endless variety, and yet has an extremely simple definition. To construct it, one starts a complex number (a point on the complex plane). Next, that number is squared. Finally one adds the original number again. These steps are repeated over and over again. "Most" points quickly diverge to infinity: 2, 2*2+2 = 6, 6*6+2 = 38, etc. However, some points do not. 0, 0*0 + 0 = 0, 0*0 + 0 = 0. These points that do not are within the set.

The either-or nature of the set isn't that interesting to draw. However, one can liven things up with a little colour. To do this, count how long it takes for the starting point to diverge beyond some boundary. By colouring by this "escape time" we can see extra detail. By choosing different boundaries, one gets different patterns of colours.

As a general rule people tend to use the smallest circle about the origin that contains the set: If the magnitude of the iterate, z = x+iy, is greater than 2: x² + y² > 2², then it can be shown that that sequence must diverge. Thus every iteration we need to do two things. The first is calculate the new point value z_new = z²+c = x²-y²+2xyi+c, and the second is the above magnitude test.

Displaying Images

The first thing we need in order to do this is have some way of displaying the results. Fortunately, it doesn't take much code to create a raw X11 window and then use a pixmap to show an image.

Next we need some sort of colour map for the escape-time colouring. To save space, we will just initialize one randomly, with a little bit of an intensity gradient built-in:

The above code has an extremely simple inner loop... and as a result is quite slow, taking 24.9 seconds to run. It could be made faster by decreasing the maximum iteration count degrading the image somewhat, but as we'll see, that isn't the bottleneck. On the other hand, the results appear nicely:

Optimization

The first thing to notice is that we are using double precision. Since the "zoom level" is quite low, this excess precision isn't needed, and we can make do with just single precision floats. Since operations on floats tend to be faster than doubles, we can obtain an immediate speedup:

Unfortunately, things are still too slow, taking 20.6 to complete the drawing of the image.

The next obvious thing to do is to use vectorization via SSE, where we can work on four floats at a time. The resulting speedup should be impressive. One way to do this is to use SSE intrinsics, however we will go directly to assembly language to try to extract as much speed as possible. First we need an SSE-enabled display function. Fortunately, gcc provides vector intrinsics that allow us to pass the data in SSE form to the inner loop:

The reason we will go directly to asm is that some inner loop functions have already been written. Converting these to the AT&T syntax used by gcc isn't difficult

These take on average 7.4 seconds for the "softlab" algorithm, 6.7 seconds for the first codeproject algorithm, 7.0 seconds for the second codeproject algorithm, and 6.7 seconds for the "Vodnaya" algorithm on this machine. Note how the "optimal" code depends quite strongly on machine type, but the timings don't vary all that much.

The above code isn't quite the best on this machine. We can speed things up a little more by noticing that that output from the cmpleps instruction is either all-ones or all-zeros. Noting that all-ones is the integer -1, we can use the psubd instruction to update the iteration counters. This saves an instruction on the critical path. A little more rearrangement gives an inner loop that takes 6.2 seconds:

The above is about as fast as we can go with the algorithm we are using. However, by changing that we can do much better. Note how most of the time spent is in the calculation of points within the Mandelbrot Set itself. Those points need to iterate MAX_ITER times before we are done. Fortunately, in many cases there is a faster way to find out if a point never escapes to infinity.

Periodicity Checking

Many points within the Mandelbrot Set eventually reach periodic orbits. i.e. they converge to a sequence of values that repeats as the square-and-add operation is done. If we can detect this repetition, then we can bail-out early, and perhaps avoid quite a bit of work. To do this, we record a value to test against. We then test the next n iterations against that number. If they become equal to it, then we know we are in a periodic loop. If not, we record a new number and double the number of iterations to test with it. If we ever enter into a loop, this will find it. Some code which implements this in C is:

The above code uses double precision (rather than single precision) and doesn't use any SSE tricks or assembly optimization. However, it completes the benchmark in 2.2 seconds! This is quite a bit faster than the "fastest" code. Of course, we can apply the same techniques to try and speed this up. Rewriting the inner loop to use assembly:

Threads

The next thing to try is a little bit of parallelization. This machine is a quad-core, and so far we have only been using a single one of those cores. By using multiple threads we can extract some more speed. Using the pthreads library with the same inner loop:

The above completes in 0.3 seconds, which corresponds to the 4-times speedup expected. It works by using scan-line interleaving. Each thread does roughly the same amount of work this way. Unfortunately, updating the screen becomes a bottleneck. This is due to the X11 calls not being thread-safe. Therefore we need to wrap them with locks to prevent data corruption. This locking slows things down. Fortunately, the result is so fast we don't need to worry too much about line-by-line updates. By only printing the image when done, we save a little more time.

The is one more trick available. We can save some more time by exploiting some of the mathematics of the Mandelbrot set. A large part of it is described by some simple mathematical functions. By evaluating these, we can work out if points lie within those regions. If so, then we don't need to evaluate the inner loop at all. This could obviously save quite a bit of time.

Wikipedia has a formula for the largest circle in the Set, given z = x+iy, points within it satisfy (x+1)²+y²<1/16. The Cardioid is a little more complicated. A test for points within that may be cheaply done by first calculating q=(x-1/4)²+y². Then points within satisfy q(q+x-1/4)<y²/4. Adding these checks to the inner loop function is simple, we just overload the meaning of the periodicity checking mask:

So how much faster is it? Unfortunately, the time taken is still 0.3 seconds. It appears that the program can't get that much faster due to the time taken by X11 to create the window. This extra optimization also seems to be mostly duplicating what the periodicity-checking code does (albeit in a much more efficient manner). However, the result is still much (20+ times) faster than the "optimal" asm code discussed on code project. (Another possibility is to use the reflection symmetry along the real axis to try and gain another factor of two, but again this doesn't help much in reality.)

Onwards

As can be seen, from the naive initial code to the final version there are almost a couple of orders of magnitude in difference in speed. So is the above the best code? Nope. There are still other algorithms that can be used to speed things up even more. However, they all add inaccuracy in some way or another.

One way to speed things up is to not calculate every point. Since the image consists of blocks of colours, if we can detect regions of constant colour we can fill the entire block at once. Of course, it is possible to get this checking wrong, and thus draw something incorrect. Different algorithms have differing sensitivities to this. One simple one, known as "solid guessing" looks at the four corner points of a square. If they are all the same colour, it assumes that the entire square is. If they differ, it subdivides and tries again. Another algorithm traces the boundaries of coloured regions. Since the Mandelbrot Set is simply-connected this works fairly well. However, since the boundary-tracing algorithm only tests a finite number of points it doesn't prevent fine features, smaller than a pixel in width from confusing it.

For deeper zooms, one can gain improved speed by noticing that the points on the screen all start out very close to each other in the complex plane. Instead of doing iterations for every point, one can discover how that rectangle (quadrilateral) is moved and distorted by the square-and-add operation. If the size is small enough, then the relative distortion is also small, and can be approximated in a simple functional form. Eventually, the distortion rises to be large enough that the quadrilateral needs to be subdivided. However, in the process we can save a large amount of work.

Even deeper, one needs to worry about arbitrary precision arithmetic. As such, there is no obviously fastest algorithm. The best code needs to seamlessly change from single precision, to double, and then arbitrary precision as required.

Comments

firr said...

Great thing, especially this cycle detections i never heard of. I recently played with mandelbrot and found thet there is possble a version with only two muls not three (when you use four cmp instead of one )



   for(n = 0 ; n <= max_iter; n++)
   {
     reim2 = (re + re) * im;

     re = (re - im) * (re + im) + cRe;
     im = reim2 + cIm;

     if( re > 2.0 || re<-2.0 && im> 2.0 || im<-2.0 ) break;

   }

had anyon tried how it works when rewritten to sse ?
(terrible captcha!)

said...

Enter your comments here

Bayu said...

When I tried to run this program, my caolalutcr gave me an error syntax message which it located at lines:IS>(Z,N:Goto 8IS>(C,NIt also gave me a window range error when I attempted to get and type of rectangle on my graph and brought me to the line:Line(J,0,J,KI took out the two syntax error program steps that I listed above and then attempted to run the program again. When I tried the graphing option, the graph would only calculate and show one rectangle before the calculating indicator just kept running without anything happening. When I broke from this and hit the go to option the caolalutcr took me to the step:Lbl DWhen I tried the non graphing option which would just calculate the sum, the same thing happened where the calculating indicator kept running after I had entered the endpoints, what I wanted to calculate, and n. This time the caolalutcr took me to the step:Lbl 7I don't have extensive knowledge in programming or understand the concept of all the different tedious steps of the program. I also don't really have a clue how to fix these errors which is why I need some assistance. I copied the code on this web page verbatim into my caolalutcr and the only theory I have for why it doesn't work is because somewhere along the line I have confused the letter I with the number 1, (or maybe the letter O with the number 0 but I don't think the letter O was used as a variable in this code). My only attempt at a solution was taking out the syntax error lines I have listed above but that didn't work as the caolalutcr wasn't able to complete calculations for either the sum or the rectangles in the graph. This is the best and most detailed and useful program I have found online and if I were able to get it to work I could save loads of time on my Calculus Homework. I hope you will be able to help me solve the problem I am facing.

Rogelio said...

With thanks for this<a href="http://umkuwb.com"> pctiraular</a> very good content; this is the type of consideration that preserves me though out the day.Weve always heard been looking around for your personal web-site right after I observed about them from a pal and was pleased when I was able to unearth it immediately after researching for a while. Being a enthusiastic blogger, I'm pleased to view other folks taking move and adding to to your community. I just needed to comment to exhibit my thanks for ones post as it is incredibly stimulating, and many authors tend not to get the credit score they deserve. I am confident I'll be again and will deliver a number of my associates.

Corina said...

Thanks for your post. I would also love to say that the very first thing you will need to carry out is verify if you rellay need repairing credit. To do that you have got to get your hands on a duplicate of your credit score. That should rellay not be difficult, considering that the government necessitates that you are allowed to obtain one no cost copy of your real credit report each year. You just have to request that from the right persons. You can either look at website with the Federal Trade Commission or even contact one of the main credit agencies instantly.

Kunjeethu said...

I am just commenting to let you<a href="http://wgprqu.com"> uratnsednd</a> of the notable discovery my friend's princess enjoyed reading through your webblog. She mastered a good number of details, most notably what it is like to possess a very effective giving mood to let other folks very easily know just exactly selected complicated topics. You undoubtedly exceeded her expected results. Many thanks for imparting the necessary, dependable, educational and also fun guidance on this topic to Julie.

Your mom just said...

Go to ur room.