Residual Networks and Information Theory
Earlier this year, I had to take a research skills course. I used this opportunity to focus on two things in deep learning that really stood out to me. Firstly, skip connections completely change the way models operate. They make deep learning possible by changing the task of models into something tractable.
However, our understanding of why they work is weak, with initial works by He et al. having broad, untestable reasoning. Some explicit models for deep learning networks often have limited scopes (example, by Saxe et al. (2014)), failing to expand to the residual networks which form the basis of models used in practice.
Secondly, we do not have a particularly good understanding of the mathematical foundations of deep learning. Information theory, however, seemed to offer a solution for, at the very least, quantifying our intuition about information and related concepts. This is exemplified in a work by Saxe et al. (2019), which used quantitative methods to repudiate an old hypothesis about how deep learning works.
These two topics occupied me during the course. They helped me develop awareness of the literature and open problems which are present. I'm excited to tackle these in the future, as there are many ways in which residual networks are far, far more mathematically tractable than traditional networks.
Anyway, I have attached the (mock) research proposal on skip connections and a literature review on residual connections and information theory. These works highlight why I think studying architecture design and its mathematical foundations is so critical for understanding deep learning systems on a fundamental level.