To understand what the open source framework Hadoop does, it is important (or at least mildly entertaining) to discuss here the history and operation of the Holophonor.
In the cartoon series, Futurama, the typically hapless protagonist Fry tries to impress his female colleague Leila by playing a Holophonor, a musical instrument that does not project sound, but rather projects a holographic image of a better version of the musician playing an enhanced version of the music the live musician intended to play. Accomplished Holophonists can not only holographically conjure themselves playing a Holophonor, they can create whole symphonies of synthetic virtuosos playing many different Holophonors in many different keys, chords and harmonies.
See, wasn’t that mildly entertaining? Now back to Hadoop.
Hadoop is a project of the Apache Software Foundation, the software engineering collective that first spawned the Apache HTTP web server, that powers the overwhelming majority of web servers in the world today.
Hadoop uses the horizontal scalability intrinsic to such things as the non-relational databases we’ve been discussing, adds in the ability to use the most modern distributed file systems and then completes the trifecta with a distributed application framework that’s especially good for running data-intensive applications.
It’s that “distributed application framework” that requires the Holophonor metaphor imagery to more easily understand. Imagine a single Holophonrist (dude holding a flute-looking thing).
Now he starts to play. Suddenly another version of himself appears, and then another. Each new virtual musician plays a different part of the song. Now substitute a computer application for the musician. Let’s say this application does something complex, like image processing. Now let’s say you want to make this application available to everyone on the web.
So not only must your app calculate pixel by pixel hue and tint changes, it must also deal with user management, user history, social sharing communication API, multi-platform file formatting, etc.
What a framework like Hadoop allows you to do is to take each set of these tasks (processing, formatting, user stuff) and break them off to be computed by different clusters of whatever computing power is to hand. So if your application is residing in a multi-server environment with multi-core processors and gads of system memory, Hadoop would allow you to re-assemble the computing power available on each machine into discrete units of computing power that can span all the servers so that each task of your image-futzing app can take what it needed in the most efficient manner possible.
In other words, each conjured Holophonist (what Hadoop calls a TaskTracker ‘node’) plays a different piece of music (application task) and is kept in time by one piece of the virtualized application (called the JobTracker).
Combining the power of Hadoop with a distributed file system and a NoSQL data environment that can exist within the same perpetually scaling server environment is what has enabled the spread of applications, or apps, to conquer both the traditional desktop web but more importantly the mobile desktop. It is only the through the power of Hadoop and similar application distribution frameworks that App Stores full of productivity-boosting (or productivity-wasting) software could exist.
Ironically as of this writing, one of the most popular entertainment apps is called Virtual Piano. It is only a small matter of time before some app programmer conjures a way to let people network their virtual musical instruments together and make the silly Holophonor metaphor come to life.
If you enjoyed this article, you should check out my post on The NoSQL Trend!