One Function To Rule Them All

posted on May 28th, 2016

Machine learning is perhaps one of the most exciting ideas of our time. It shapes our lives (e.g., Google search) and our dreams (e.g. The Matrix). In an effort to make sense of this field, new descriptive words have entered everyday vernacular: words like deep, neural and emergent. Unfortunately, these words often sound more like they belong in ancient books of magic and mysticism than in our modern age. Fortunately, it doesn’t have to be this way. In what follows we’ll start with a simple mathematical model – capturing much of machine learning in its net – and translate it piece by piece into everyday language without relying on mysitcism.

The simple model we will start from is Y = f(X) + e. This model is written in a language known as Algebra, but there is no magic here. Just like with any language Algebra can be translated for people who aren’t fluent in it. That is, math is a language and, like with any other language, it can be translated.

To begin the translation we’ll start with X and Y. The variable X has a long history in mathematical pursuits and can many different interpretations depending on the context. Luckily, all these intrepretations have similar connotations that can often be summarized as either “inputs” or “ingredients”. Similarly, Y also has a long history with many flavors but can perhaps be best captured here as “results”. It is important to denote here that X and Y are plural. Plurality is usually indicated in mathematics by capital letters (so X is plural while x is singular).

Plugging the translations for X and Y back into the original model gives: results = f(inputs) + e. Even without translating “f” or “e” a useful intuition can already drawn. No matter the method, all learning algorithms can be boiled down to taking a set of inputs and somehow turning them into results.

Next, we’ll translate “f” and “e”. These can roughly be translated to “action” and “chance”, respectively. Cleaning up the parenthesis a little in the original model gives results = action + inputs + chance. This is still a little muddy, but before cleaning it up a little more, I want to point out something really important here. Chance is an integral component of all learning algorithms. No machine learning algorithm will ever be all knowing. There will always be a chance they are wrong.

Now, that we’ve roughly translated our original equation let’s explore it a little more. The thing that I find most interesting about this equation is that it is really quite banal. For example, let’s replace our generic words with a specific example. (Drop + Ball + Chance) = (Ball Falls or Doesn’t Fall). And that’s it, according to our “Machine Learning” equation, if I drop (action) a ball (input), it will either fall or not fall (result). We can probably say that 99.999% of the time the ball will fall, but once in a blue moon chance might have it that a bird grabs our ball and flies off with it.

The second thing I find interesting about this equation is that we can add specificity as an after thought using another equation. For example, our ball drop example isn’t very specific. Let’s say we want to be more specific about what Drop means. We could then define it using another equation (Hand Releases + Object In Hand + Chance) = Drop. Now, we know that Drop means something I drop from my hand rather than by some other means.

And that’s it. We’ve just explained how a lot of machine learning algorithms work with only the ideas of results, action, inputs and chance. No need for “neural nets” or “deep learning” or “oracles”. Still, those words have some meaning. In the case of machine learning they refer to the different kinds of “actions”. So the equation for a Neural Net would look like this, Results = Neural-Net(Inputs) + Chance. While the equation for a Deep Learning algorithm would be Results = Deep-Learning(Inputs) + Chance.

(For even more information than is contained here I recommend chapter 2 of The Elements of Statistical Learning)