I've been diving into Java programs written by a gentleman named Jacob Perkins (https://github.com/thedatachef) who has posted the following information on his blog regarding these programs. His Lyapunov program in Java is based on the Sprott model described in his book, so I'm assuming for now that it is entirely appropriate for what we're trying to do.
Also, in talking with Loren O. today, he points out that we don't necessarily need to port Java code into Max in order to allow people to use it. It is possible, and perhaps preferable, to create stand-alone Java code that can process data, return arguments, etc. and use OSC to communicate with Max and other systems.
I'm really intrigued by Hadoop and the pig scripting capabilities. So here I am looking at fairly advanced Java programs, Hadoop and pig, and I'm a complete newby with all of these tools. A couple of hours with an expert would be amazing.
Here's the relevant text from Jacob Perkin's blog (http://thedatachef.blogspot.com/):
Monday, August 12, 2013
Using Hadoop to Explore Chaos
You know what though? Sometimes it's good to explore things that don't have an obvious business use case. Things that are weird. Things that are pretty. Things that are ridiculous. Things like dynamical systems and chaos. And, if you happen to find there are applicable tidbits along the way (*hint, skip to the problem outline section*), great, otherwise just enjoy the diversion.
motivation
In most cases, you'll have a system of ordinary differential equations like this:
In this case vv represents the potential difference between the inside of the neuron and the outside (membrane potential), and ww corresponds to how the neuron recovers after it fires. There's also an external current IextIext which can model other neurons zapping the one we're looking at but could just as easily be any other source of current like a car battery. The numerical constants in the system are experimentally derived from looking at how giant squid axons behave. Basically, these guys in the 60's were zapping giant squid brains for science. Understand a bit more why I think your business use case is boring?
One of the simple ways you can study a dynamical system is to see how it behaves for a wide variety of parameter values. In the Fitzhugh-Nagumo case the only real parameter is the external current IextIext. For example, for what values of IextIext does the system behave normally? For what values does it fire like crazy? Can I zap it so much that it stops firing altogether?
In order to do that you'd just decide on some reasonable range of currents, say (0,1)(0,1), break that range into some number of points, and simulate the system while changing the value of IextIext each time.
chaos
problem outline
What does Hadoop have to do with all this anyway? Well, I've got to break the parameter plane (ab-plane) into a set of coordinates and run one simulation per coordinate. Say I let a=[amin,amax]a=[amin,amax] and b=[bmin,bmax]b=[bmin,bmax] and I want to look NN unique aa values and MM unique bb values. That means I have to run N×MN×M individual simulations!
Clearly, the situation gets even worse if I have more parameters (a.k.a a realistic system). However, since each simulation is independent of all the other simulations, I can benefit dramatically from simple parallelization. And that, my friends, is what Hadoop does best. It makes parallelization trivially simple. It handles all those nasty details (which distract from the actual problem at hand) like what machine gets what tasks, what to do about failed tasks, reporting, logging, and the whole bit.
So here's the rough idea:
- Use Hadoop to split the n-dimensional (2D for this trivial example) space into several tiles that will be processed in parallel
- Each split of the space is just a set of parameter values. Use these parameter values to run a simulation.
- Calculate the lyapunov exponent resulting from each.
- Slice the results, visualize, and analyze further (perhaps at higher resolution on a smaller region of parameter space), to understand under what conditions the system is chaotic. In the simple Henon map case I'll make a 2D image to look at.
implementation
- Pig will load a spatial specification file that defines the extent of the space to explore and with what granularity to explore it.
- A custom Pig LoadFunc will use the specification to create individual input splits for each tile of the space to explore. For less parallelism than one input split per tile it's possible to specify the number of total splits. In this case the tiles will be split mostly evenly among the input splits.
- The LoadFunc overrides Hadoop classes. Specifically: InputFormat (which does the work of expanding the space), InputSplit (which represents the set of one or more spatial tiles), and RecordReader (for deserializing the splits into useful tiles).
- A custom EvalFunc will take the tuple representing a tile from the LoadFunc and use its values as parameters in simulating the system and computing the lyapunov exponent. The lyapunov exponent is the result.
1
2
3
4
5
6
7
|
define LyapunovForHenon sounder.pig.chaos.LyapunovForHenon(); points = load 'data/space_spec' using sounder.pig.points.RectangularSpaceLoader(); exponents = foreach points generate $0 as a, $1 as b, LyapunovForHenon($0, $1); store exponents into 'data/result'; |
You can take a look at the detailed implementations of each component on github. See:LyapunovForHenon, RectangularSpaceLoader
running
1
2
3
|
$: cat data/space_spec 0.6,1.6,800 -1.0,1.0,800 |
Remember the system?
Now, these bounds aren't arbitrary. The Henon attractor that most are familiar with (if you're familiar with chaos and strange attractors in the least bit) occurs when a=1.4a=1.4 and b=0.3b=0.3. I want to ensure I'm at least going over that case.
result
1
2
3
4
5
6
7
8
9
10
11
12
|
$: cat data/result/part-m* | head 0.6 -1.0 9.132244649409043E-5 0.6 -0.9974968710888611 -0.0012539625419929572 0.6 -0.9949937421777222 -0.0025074937591903013 0.6 -0.9924906132665833 -0.0037665150764570965 0.6 -0.9899874843554444 -0.005032402237514987 0.6 -0.9874843554443055 -0.006299127065420516 0.6 -0.9849812265331666 -0.007566751054452304 0.6 -0.9824780976220276 -0.008838119048229768 0.6 -0.9799749687108887 -0.010113503950504331 0.6 -0.9774718397997498 -0.011392710785045064 $: cat data/result/part-m* > data/henon-lyapunov-ab-plane.tsv |
To visualize I used this simple python script to get:
The big swaths of flat white are regions where the system becomes unbounded. It's interesting that the bottom right portion has some structure to it that's possibly fractal. The top right portion, between b=0.0b=0.0and b=0.5b=0.5 and a=1.0a=1.0 to a=1.6a=1.6 is really the only region on this image that's chaotic (where the exponent is non-negative and greater than zero). There's a lot more structure here to look at but I'll leave that to you. As a followup it'd be cool to zoom in on the bottom right corner and run this again.