On Some Fundamental Aspects of Learning in Artificial Neural Networks and Universal Channel Codes
This thesis discusses a few interesting topics regarding fundamental aspects of learning in the following prevalent application scenarios: (1) training a neural network for image classification, and (2) using a channel code for universal communication at capacity. For the classification problem, we aim to develop a better understanding of what representations each layer of the network learns. In particular, we compare the higher-layer activations of two neural networks with identical architecture but different initializations via adaptive $k$NN graph approximation of the underlying manifold, and we show that there are vast similarities between the underlying manifolds of the two networks but with discrepancy in potentially highly-curved regions. We also investigate locality of the receptive field in the Convolutional Neural Networks by using semi-localized filters with random neuron connection, where we find out that the receptive field might be beyond local for feature extraction as is hard coded in traditional design. For the communication problem, we study universal channel coding under the high-dimensional statistical setting beyond Shannon's classical framework, and we prove a series of theorems that may surprisingly indicate a need to learn the entire channel in order to achieve its capacity.