Mathematical Institute, University of Oxford, Trinity Term 2018:
Randomized algorithms for matrix computations and data analysis
Instructor:
Gunnar Martinsson. Room: S2.32. Email: martinsson@maths.ox.ac.uk
Meeting times:
Week 2: Tue 9:00-10:00 in L6. Wed 9:00-10:00, C2.
Week 3: Wed 9:00-10:00, C2.
Week 4: Tue 9:00-11:00 in L6 (note double lecture!). Wed 9:00-10:00, C2.
Week 5: Tue 9:00-10:00 in L5. Wed 9:00-10:00, C2.
Week 6: Tue 9:00-10:00 in L6. Wed 9:00-10:00, C2.
Week 7: Tue 9:00-10:00 in L6. Wed 9:00-10:00, C2.
Website:
http://people.maths.ox.ac.uk/martinsson/Teaching/2018_random_matrix/
Description:
In this course, we will describe and analyze algorithms that are designed
to be capable of handling large and high-dimensional datasets, and
to be able to take full advantage of modern computing hardware. The
common theme in the methods to be discussed is a reliance on
randomized projections and randomized sampling to reduce the
effective dimensionality of problems, without undue loss of
accuracy. The course will discuss the mathematical theory
underlying these methods, which involves techniques from random
matrix theory, numerical linear algebra, probability, and
functional analysis. It will also discuss practical issues that are
important to attain high computational speed in practice, such as
blocking of algorithms, how to minimize communication, how to
handle situations where data is stored on slow memory (hard drives)
or is stored on distributed memory system.
For each algorithm, we will discuss some relevant
application areas and describe the mathematical modeling involved.
Examples include statistical data
analysis (PCA in particular), linear regression problems,
clustering, spectral methods for analyzing graphs, and many more.
Examination:
By written report, following standard guidelines.
Topics:
- Introduction:
Overview and motivation of the course topics. Brief
discussion of topics that you have very likely seen in prior course work, but which might be
helpful to review to refresh key points, and to introduce the notation used in the course.
(SVD, Eckart-Young theorem, column pivoted QR factorization, fundamental subspaces associated
with a matrix, basics of efficient computing such as blocking of algorithms and minimizing communication.)
- Randomized low-rank approximation:
We describe a basic randomized algorithm for computing a low-rank approximation to a given matrix
that we call the randomized singular value decomposition (RSVD). We will discuss its computational
cost in different environments and describe pros and cons vis-à-vis traditional deterministic methods.
- Single pass algorithms:
Some data sets are so large that they cannot be stored at all. We will demonstrate that it is still possible to
compute a sketch of the data in an environment where each piece of data can be viewed only once, and not stored.
- Fast Johnson-Lindenstrauss transforms:
Classical algorithms for computing a rank-k approximation to a matrix of size m x n require at least O(mnk) flops.
Using randomized algorithms, the asymptotic flop count can be reduced to O(mn log k).
The idea is to use specialized random projections with enough internal structure that they can be applied rapidly.
- Approximation errors and randomized power iteration:
We will look at some theorems that describe the behavior of randomized algorithms for low-rank approximation,
and then look at the actual errors resulting when the algorithms are applied to representative problems. We will
see that when the data is "noisy," the standard RSVD does not perform well, but that the accuracy can be greatly
improved by incorporating ideas from subspace iteration.
- Theoretical analysis of the RSVD algorithm:
We will describe the proofs of key theorems, and introduce some important and useful concepts from random matrix theory.
- Principal Component Analysis (PCA) and other applications of linear approximation:
The idea of low rank approximation of a matrix arises in many different area of data analysis and has been
described under different names. The best known is probably Principal Component Analysis which is a method
for fitting observed data to an assumption that it is drawn from a multivariate normal distribution that is
driven by a small number of "hidden" parameters. We will also discuss Latent Semantic Indexing, Eigenfaces,
and some other applications of similar ideas.
- Clustering, and non-linear problems:
Many real world problems involve data that cannot be well represented by a linear model. We will
describe some of the challenges, and techniques that are useful in order to get a handle on more
general point clouds in high dimensional space.
- Diffusion geometry and random walks on data sets:
One technique for parameterizing a non-linear manifold represented by a set of point observations is
to conduct a random walk on the set of points where the likelihood of jumping from one point to the
next depends strongly on their distance. Then a metric can be introduced that represents the likelihood
of moving from point to another in a given time. We will describe the mathematics of this problem, and look
at some numerical experiments.
- Johnson-Lindenstrauss theory and the Bourgain theorem:
Throughout the course, we will at this point have exploited several times that randomized
projections provide an excellent computational tool for reducing the effective dimension of a problem in
a high-dimensional space. We will now review some of the fundamental mathematical results underlying
this observation, and then discuss applications to problems such as approximate nearest neighbor search.
- Data interpretation via the Interpolative and CUR decompositions:
If time permits, we will describe some matrix factorizations that are similar to traditional ones such
as the SVD of QR factorization in that they enable compact storage of a matrix and highly efficient
algorithms for solving various linear algebraic tasks. However, these alternative factorizations use
subsets of the original columns and rows of a matrix to form bases for its column and row spaces.
This enables data interpretation, and also leads to factorizations that preserve important properties
of a matrix such as sparsity, or non-negativity.
- Summary and review.
Note: There is a rough correspondence between topics listed and the 12 lectures in the course.
However, some topics will require less than a full lecture, and some will require more, so the list is not
intended as a precise schedule. There is a substantial likelihood that we will on average require more
than one lecture for each topic, in which case we will skip the last one or two points on the list.
Class notes (when applicable):
Resources: