2 min read

Honors Thesis

Honors Thesis
A functor string diagram shows all the details of underlying objects and the axes over which they are lifted. When all underlying objects are the same, we get neural circuit diagrams. Functor string diagrams have extensive mathematical formalism. Neural circuit diagrams, therefore inherit this mathematical expressiveness.

I have recently finished my honors thesis. An honors thesis is something of a mix between a senior and a master's thesis, some departments at my university grade master's and honor's theses in the same pool. Other departments may treat them differently.

Anyway, for me it involved writing a 90-page report on research I had done over the last year, and so I focused mine on neural circuit diagrams, and used the extra space to develop their underlying mathematics. Because of my prior research experience, I was in the fortunate position of being able to write about genuinely novel contributions throughout my work.

I aimed to present neural circuit diagrams in an accessible way and to formalize the mathematics behind them. To do so, I had to design a graphical language that could represent compositional structure in an intuitive manner. In other words, I had to develop a graphically intuitive approach to category theory, the mathematical study of composition.

This resulted in functor string diagrams, which borrowed heavily from Marsden and Nakahira. However, I added my own twist that I think makes these diagrams more robust and widely applicable. Functor string diagrams and their rich mathematical vocabulary allow neural circuit diagrams to be formally extended with additional modes of analysis, a subject that I'm excited to further explore.

Above, I've attached my thesis. I've included the abstract below. If you're interested in systematic frameworks to understand, communicate, and analyze deep learning architectures, give it a skim. If anything catches your interest, do get in touch.

Abstract

Deep learning models are at the forefront of human technology. However, we lack a systematic framework for understanding these systems as mathematically explicit composed structures. This hinders our ability to communicate, implement, and analyze models. This work resolves this shortfall. In this thesis, I present neural circuit diagrams, a comprehensive graphical language for deep learning architectures with a robust mathematical basis in category theory. This work is split into three chapters that identify gaps in the research and contribute solutions.

Chapter 1: The Problem and the Solution I assess the current state of deep learning research. I identify the importance of architectural innovations to the success of contemporary models. I contribute a critique that I believe is vital to address: we lack a robust graphical language for understanding architectures. I provide a case study of Attention is All You Need, showing how its presentation is unclear. More generally, we lack a compositional mathematical framework that encompasses contemporary models. Then, I argue why category theory is a promising approach to resolve these issues.

Chapter 2: Applications of Neural Circuit Diagrams I introduce neural circuit diagrams with a focus on practical applications. This chapter avoids category theory, proving that diagrams can achieve general adoption. I provide comprehensive diagrams for a host of architectures – from transformers to computer vision – contributing explanations for systems that are otherwise difficult to communicate. Finally, I prove the analytical utility of neural circuit diagrams by using them to analyze linear rearrangements and computational complexities of algorithms.

Chapter 3: Theory of Functor String Diagrams I focus on the robust theory underlying neural circuit diagrams. Using category theory, I build on the nascent field of functor string diagrams, contributing first principles and novel tools such as family expressions. I reconcile neural circuit diagrams with functor string diagrams, showing how neural circuit diagrams have a robust mathematical basis. By providing an explicit model of deep learning architectures, this section contributes the foundation for future work that systematically analyzes the mathematical properties of deep learning architectures.