Mathematical Institute, University of Oxford, Trinity Term 2018:
Randomized algorithms for matrix computations and data analysis

Instructor:
Gunnar Martinsson. Room: S2.32. Email: martinsson@maths.ox.ac.uk

Meeting times:
Week 2: Tue 9:00-10:00 in L6. Wed 9:00-10:00, C2.
Week 3: Wed 9:00-10:00, C2.
Week 4: Tue 9:00-11:00 in L6 (note double lecture!). Wed 9:00-10:00, C2.
Week 5: Tue 9:00-10:00 in L5. Wed 9:00-10:00, C2.
Week 6: Tue 9:00-10:00 in L6. Wed 9:00-10:00, C2.
Week 7: Tue 9:00-10:00 in L6. Wed 9:00-10:00, C2.

Website:
http://people.maths.ox.ac.uk/martinsson/Teaching/2018_random_matrix/

Description:
In this course, we will describe and analyze algorithms that are designed to be capable of handling large and high-dimensional datasets, and to be able to take full advantage of modern computing hardware. The common theme in the methods to be discussed is a reliance on randomized projections and randomized sampling to reduce the effective dimensionality of problems, without undue loss of accuracy. The course will discuss the mathematical theory underlying these methods, which involves techniques from random matrix theory, numerical linear algebra, probability, and functional analysis. It will also discuss practical issues that are important to attain high computational speed in practice, such as blocking of algorithms, how to minimize communication, how to handle situations where data is stored on slow memory (hard drives) or is stored on distributed memory system.

For each algorithm, we will discuss some relevant application areas and describe the mathematical modeling involved. Examples include statistical data analysis (PCA in particular), linear regression problems, clustering, spectral methods for analyzing graphs, and many more.

Examination:
By written report, following standard guidelines.

Topics:

  1. Introduction: Overview and motivation of the course topics. Brief discussion of topics that you have very likely seen in prior course work, but which might be helpful to review to refresh key points, and to introduce the notation used in the course. (SVD, Eckart-Young theorem, column pivoted QR factorization, fundamental subspaces associated with a matrix, basics of efficient computing such as blocking of algorithms and minimizing communication.)
  2. Randomized low-rank approximation: We describe a basic randomized algorithm for computing a low-rank approximation to a given matrix that we call the randomized singular value decomposition (RSVD). We will discuss its computational cost in different environments and describe pros and cons vis-à-vis traditional deterministic methods.
  3. Single pass algorithms: Some data sets are so large that they cannot be stored at all. We will demonstrate that it is still possible to compute a sketch of the data in an environment where each piece of data can be viewed only once, and not stored.
  4. Fast Johnson-Lindenstrauss transforms: Classical algorithms for computing a rank-k approximation to a matrix of size m x n require at least O(mnk) flops. Using randomized algorithms, the asymptotic flop count can be reduced to O(mn log k). The idea is to use specialized random projections with enough internal structure that they can be applied rapidly.
  5. Approximation errors and randomized power iteration: We will look at some theorems that describe the behavior of randomized algorithms for low-rank approximation, and then look at the actual errors resulting when the algorithms are applied to representative problems. We will see that when the data is "noisy," the standard RSVD does not perform well, but that the accuracy can be greatly improved by incorporating ideas from subspace iteration.
  6. Theoretical analysis of the RSVD algorithm: We will describe the proofs of key theorems, and introduce some important and useful concepts from random matrix theory.
  7. Principal Component Analysis (PCA) and other applications of linear approximation: The idea of low rank approximation of a matrix arises in many different area of data analysis and has been described under different names. The best known is probably Principal Component Analysis which is a method for fitting observed data to an assumption that it is drawn from a multivariate normal distribution that is driven by a small number of "hidden" parameters. We will also discuss Latent Semantic Indexing, Eigenfaces, and some other applications of similar ideas.
  8. Clustering, and non-linear problems: Many real world problems involve data that cannot be well represented by a linear model. We will describe some of the challenges, and techniques that are useful in order to get a handle on more general point clouds in high dimensional space.
  9. Diffusion geometry and random walks on data sets: One technique for parameterizing a non-linear manifold represented by a set of point observations is to conduct a random walk on the set of points where the likelihood of jumping from one point to the next depends strongly on their distance. Then a metric can be introduced that represents the likelihood of moving from point to another in a given time. We will describe the mathematics of this problem, and look at some numerical experiments.
  10. Johnson-Lindenstrauss theory and the Bourgain theorem: Throughout the course, we will at this point have exploited several times that randomized projections provide an excellent computational tool for reducing the effective dimension of a problem in a high-dimensional space. We will now review some of the fundamental mathematical results underlying this observation, and then discuss applications to problems such as approximate nearest neighbor search.
  11. Data interpretation via the Interpolative and CUR decompositions: If time permits, we will describe some matrix factorizations that are similar to traditional ones such as the SVD of QR factorization in that they enable compact storage of a matrix and highly efficient algorithms for solving various linear algebraic tasks. However, these alternative factorizations use subsets of the original columns and rows of a matrix to form bases for its column and row spaces. This enables data interpretation, and also leads to factorizations that preserve important properties of a matrix such as sparsity, or non-negativity.
  12. Summary and review.
Note: There is a rough correspondence between topics listed and the 12 lectures in the course. However, some topics will require less than a full lecture, and some will require more, so the list is not intended as a precise schedule. There is a substantial likelihood that we will on average require more than one lecture for each topic, in which case we will skip the last one or two points on the list.

Class notes (when applicable):

Resources: