MITB Banner

Does Deep Learning Work In Theory?

Share

Behind every successful scientific implementation, there is a theory that supports the results or allows one to anticipate the consequences. In the case of machine learning, however, the situation is a bit counterintuitive. Though the number of implementations of ML is spiking every day, one still cannot pinpoint the reason why a particular model is making some predictions. 

Machine learning models are called black-box models for a reason! 

Why does a certain model work? Is it the number of layers? Is it the depth? Is it the width? There are plenty of unanswered questions such as these. And, trying to answer these questions would lead to the concept of information theory and a bunch of complexities.

The Complexity Of Defining A Theory

via paper by Hoang and Guerraaoui

Kolmogorov complexity of some function is the length of the shortest possible program which can produce the same outputs as the function for all given inputs. 

The human brain has around 10^15 synapses, and 10^9 of these synapses are likely critical to the kind of natural language processing needed to pass the Turing test.  The Kolmogorov complexity of the Turing test is expected to be of the order of 10^9 bits. In other words, one cannot solve this problem with a shorter algorithm.

To have a better understanding of this complexity, let’s take a simple example.

If you have to display a series of letters in a sequence, this will suffice:

>>print ab * 6

>>o/p: abababababab

And if a random string like this ‘ahfu354ht4bjjk5’, has to be printed, the whole string needs to be saved in memory, assigned a variable and has to be called.

>>X = ‘ahfu354ht4bjjk5’

>>print X

In the previous case, the operator ‘*’ would do the job, and in the latter, memorisation makes it more complicated. Well defined rules cut down computational costs. However, how many rules can we write?

Let’s look at this animation below:

via theorangeduck

Designing this simulated dancing model comes with many rules, from the folds of dress to shadow in the background to the movement of limbs. The complexity of writing a program that generates a simulation while satisfying the laws of physics gets complex. So, increasing complexity and parallelisation of computational resources don’t always go hand in hand.

However, by applying principal component analysis PCA to data, one can guess how compressible the data is – and how to decompress it.

Along with insights about “complexity” of the simulation, PCA has a special behaviour – it also extracts information regarding movements within the simulation as shown in this informative post.

Deep Learning models can generalise well in practice despite its large capacity, numerical instability, sharp minima, and non-robustness, which is a contradiction — a paradox.

What Makes Deep Learning Successful

Prof.Kolmogorov with his students

We are still groping the walls in the dark, we are moving, but still blind. To shed some light onto the underlying principles of deep learning practical successes, the researchers from EPFL Switzerland, published a paper that mentions three conjectures, giving a direction to ML theory.

Conjecture 1

Most of the data from the current state of our universe and most of the problems we aim to solve with these data, as well as any good approximations of these data and problems, have a Kolmogorov complexity larger than 10^9 bits. 

Conjecture 2

Most of the data from the current state of our universe and most of the problems we aim to solve with these data, as well as any good approximations of these data and problems, have a large non-parallelisable logical depth. 

Conjecture 3

At equivalent Kolmogorov complexity, deeper neural networks compute functions with larger non-parallelisable logical depth.

Though the reason behind the superior performance of larger models is ambiguous, researchers have shown their inclination towards larger models for their `easy” optimisation with methods like stochastic gradient descent SGD. These methods help the models converge to global minima in over-parameterised regimes.

The researchers at EPFL posit that the success behind deep learning practical applications is connected to a key feature of the data collected from our surrounding universe to feed the machine learning algorithms: large non-parallelisable logical depth. 

The authors argue that the formidable logical depth of mathematics has been the key to understanding physical phenomena of large logical depth (and small Kolmogorov complexity), in a manner that human brains cannot match.

Drawing a parallel between the success of deep learning and effectiveness of mathematics, the authors observed that the prevalence of depth is the common denominator.

The authors believed that determining the non-parallelisable logical depth of real data, as well as of specific functions related to this data, would be a significant step towards a theoretical understanding of deep learning. 

Share
Picture of Ram Sagar

Ram Sagar

I have a master's degree in Robotics and I write about machine learning advancements.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.