The first day of the workshop features tutorial presentations from a subset of the organizers. These tutorials present an up-to-date account of the intersection between low-dimensional modeling and deep learning in an accessible format.
The first session will introduce fundamental properties and theoretical results for sensing, processing, analyzing, and learning low-dimensional structures from high-dimensional data. We will first discuss classical low-dimensional models, such as sparse recovery and low-rank matrix sensing, and motivate these models by applications in medical imaging, collaborative filtering, face recognition, and beyond. Based on convex relaxation, we will characterize the conditions, in terms of sample/data complexity, under which the inverse problems of recovering such low-dimensional structures become tractable and can be solved efficiently, with guaranteed correctness or accuracy.
We will transit from sensing to learning low-dimensional structures, such as dictionary learning, sparse blind deconvolution, and dual principal component analysis. Problems associated with learning low-dimensional models from sample data are often nonconvex: either they do not have tractable convex relaxations or the nonconvex formulation is preferred due to physical or computational constraints (such as limited memory). To deal with these challenges, we will introduce a systematic approach of analyzing the corresponding nonconvex landscapes from a geometry and symmetry perspective. The resulting approach leads to provable globally convergent nonconvex optimization methods.
We will discuss the contemporary topic of using deep models for computing with nonlinear data, introducing strong conceptual connections between low-dimensional structures in data and deep models. We will then consider a mathematical model problem that attempts to capture these aspects of practice, and show how low-dimensional structure in data and tasks influences the resources (statistical, architectural) required to achieve a given performance level. Our discussion will revolve around basic tradeoffs between these resources and theoretical guarantees of performance.
Ohio State University
Continuing our exploration of deep models for nonlinear data, we will begin to delve into learned representations, network architectures, regularizations, and beyond. We will see how the tools for nonconvexity developed previously shed light on the learned representations produced by deep networks, through connections to matrix factorization. We will observe how algorithms that interact with data will expose additional connections to low-dimensional models, through implicit regularization of the network parameters.
Based upon the previous discussion on the connection between low-dimensional structures and deep models, in this section, we will discuss principles for designing deep networks through the lens of learning good low-dimensional representation for (potentially nonlinear) low-dimensional structures. We will see how unrolling iterative optimization algorithms for low-dimensional problems (such as the sparsifying algorithms) naturally lead to deep neural networks. We will then show how modern deep layered architectures, linear (convolution) operators, and nonlinear activations, and even all parameters can be derived from the principle of learning a compact linear discriminative representation for nonlinear low-dimensional structures within the data. We will show how so learned representations can bring tremendous benefits in tasks such as learning generative models, noise stability, and incremental learning.
We discuss the role of sparsity in general neural network architectures, and shed light on how sparsity interacts with deep learning under the overparameterization regime, for both practitioners and theorists. A sparse neural network (NN) has most of its parameters set to zero and is traditionally considered as the product of NN compression (i.e., pruning). Yet recently, sparsity has exposed itself as an important bridge for modeling the underlying low dimensionality of NNs, for understanding their generalization, optimization dynamics, implicit regularization, expressivity, and robustness. Deep NNs learned with sparsity-aware priors have also demonstrated significantly improved performances through a full stack of applied work on algorithms, systems, and hardware. In this talk, I plan to cover recent progress on the practical, theoretical, and scientific aspects of sparse NNs. I will try scratching the surface of three aspects – (1) practically, why one should love a sparse NN, beyond just a post-training NN compression tool; (2) theoretically, what are some guarantees that one can expect from sparse NNs; and (3) what is future prospect of exploiting sparsity in NNs.
Michigan State University
In this talk, we present our work on improving machine learning for image reconstruction on three fronts – i) learning regularizers, ii) learning with no training data, and iii) ensuring robustness to perturbations in learning-based schemes. First, we present an approach for supervised learning of sparsity-promoting regularizers, where the parameters of the regularizer are learned to minimize reconstruction error on a paired training set. Training involves a challenging bilevel optimization problem with a nonsmooth lower-level objective. We derive an expression for the gradient of the training loss using the implicit closed-form solution of the lower-level variational problem, and provide an accompanying exact gradient descent algorithm (dubbed BLORC). Our experiments show that the gradient computation is efficient and BLORC learns meaningful operators for effective denoising. Second, we investigate the deep image prior (DIP) scheme that recovers an image by fitting an overparameterized neural network directly to the image’s corrupted measurements. To address DIP’s overfitting and performance issues, recent work proposed using a reference image as the network input. However, obtaining the reference often requires supervision. Hence, we propose a self-guided scheme that uses only undersampled measurements to estimate both the network weights and input image. We exploit regularization requiring the network to be a powerful denoiser. Our self-guided method gives significantly improved reconstructions for MRI with limited measurements compared to recent schemes, while using no training data. Finally, recent studies have shown that trained deep reconstruction models could be over-sensitive to tiny input perturbations, which cause unstable, low-quality reconstructed images. To address this issue, we propose Smoothed Unrolling (SMUG), which incorporates a randomized smoothing-based robust learning operation into a deep unrolling architecture and improves the robustness of MRI reconstruction with respect to diverse perturbations.