Matrix factorization and Deep Learning in Scientific Domains: Understanding When and Why It Works
Propelled by large datasets and parallel compute accelerators, deep neural networks have recently demonstrated human-like performance in many domains previously beyond the reach of machines. Computers can now recognize objects in images, transcribe speech and exhibit reading comprehension at the level of an average human. However, there are many domains that require expert knowledge beyond that of the average person, e.g. medical diagnosis and scientific analysis. This raises the question -- beyond average humans, can modern statistical models perform as well as domain experts? Attempting to answer this question, this thesis considers modern machine learning as applied to several expert domains. In particular, we consider problems related to sustainability, placing our work in the domain of computational sustainability -- a nascent field at the intersection of computer science and sustainability. Firstly, we consider using neural networks to identify invasive species habitats from remote sensing images, showing that unsupervised learning can make use of sparse expert labels and cheap satellite images. Secondly, we consider passive acoustic monitoring of endangered animals and introduce a novel data-driven compression scheme for this setting. Thirdly, we consider the use of non-negative matrix factorization (NMF) to spectroscopic datasets in materials science, and show how combining discrete and continuous optimization can yield solutions that accelerate scientific discovery. In expert domains such as these, empirical performance is not always enough. Instead, one needs to know when and why machine learning methods work to utilize model predictions. Paradoxically, most machine learning models are NP-hard to optimize, yet work remarkably well in practice. Inspired by our practical problems, we make empirical and theoretical contributions towards a principled understanding of when and why machine learning works. On the theoretical side, we introduce a randomized average case model for NMF and prove that certain convexity properties arise naturally in this model. On the empirical side, we consider a popular method in deep learning -- batch normalization -- which cannot improve model expressivity yet improves performance in practice. We demonstrate that the improved conditioning this normalization confers enables larger learning rates which has a regularizing effect.