Neural Network Architecture Design: : Towards Low-complexity and Scalable Solutions
Abstract: Over the past few years, deep neural networks have been at the center of attention in machine learning literature thanks to the advances in computational capabilities of modern graphical processing units (GPUs). This progress has made it possible to train large scale neural networks by using thousands, and even millions, of training samples to achieve outstanding estimation accuracy in various applications that were not simply possible before. Besides, the lack of a coherent understanding of neural networks theory has shifted the focus of current machine learning researches from a theoretical view to experimental studies by using clusters of GPU. Therefore, the current deep learning literature is still a novice when it encounters real-world scenarios where the number of training samples is small or the computational resources are limited. In this thesis, we focus on developing new neural network architectures while taking such practical constraints into account. First, we propose a layer-wise training approach for multilayer neural networks that can guarantee a reduction of the training loss as the network gets deeper. While being computationally efficient, this approach provides us with an estimation of the appropriate size of the network, i.e., the number of neurons and layers. The proposed approach also enjoys a scalable training algorithm, making it attractive for distributed learning scenarios over a network of agents. Second, we focus on designing a deep neural network architecture to handle small data learning regimes, where the number of training samples is limited. To this end, we combine kernel methods and densely connected networks and show its classification capabilities in few-shot learning scenarios. Due to the use of kernel representation, the proposed approach is capable of handling large dimensional samples and feature vectors since the complexity of the training algorithm is mainly determined by the number of samples rather than their dimensions. And third, we solely focus on designing a deep neural network architecture with very-low computational requirements, making it suitable for power-limited applications such as learning on the edge devices. In particular, we use a combination of random weights and ReLU activation functions to achieve an accurate estimation as the network gets deeper. In the next part of the thesis, we present some applications of the proposed architectures and show how they can contribute to the current machine learning literature. First, we give an example of how we can incorporate incremental learning setup into an adaptive size multilayer neural network by using our proposed network. Then, webring new insight from an information-theoretic point of view on the signal flow of a multilayer neural network. We also show examples of how it is possible to use our techniques to improve the performance of state-of-the-art deep networks. And finally, we briefly show the favorable characteristics of our training algorithms that make them suitable for a variety of distributed learning scenarios over a network.
CLICK HERE TO DOWNLOAD THE WHOLE DISSERTATION. (in PDF format)