Open Topics in Applied Mathematics: Fast Algorithms for Big Data, Spring 2016 (APPM 4720/5720)

Meeting times: MWF 10.00 - 10.50, ECCR 150.

Instructor: Gunnar Martinsson. ECOT 233.

Office Hours: Monday 4:00pm - 4:50pm and Wednesday 3:00pm - 4:50pm (ECOT 233).

Homeworks: Please submit a hard copy in class on the due date. Group homeworks are permitted and encouraged. The maximum group size is three for 4720, and two for 5720.

Reference homeworks: Each student (not each group) should pick one problem (not "sets") during the course of the semester and submit a solution that is clean enough to be posted on this web-page. Please attach matlab-code when appropriate. To avoid that one problem is picked by more than one student, there is a sign-up sheet that can be viewed here. Each student who is registered for the course will be emailed a link that allows editing of the sign-up sheet.

Projects: An important part of the course is the final project, accounting for 40% of your final grade. Several suggested projects will be posted on this webpage. If you find some part of the class particularly relevant to your academic interests, you have the option to propose an individual or a group project on that subject. If you want to create your own project, please consult with the instructor no later than March 15.

Various documents: Syllabus.

Notes:
Week:
Homework:
Material covered:
1 (Jan 11)
Low-rank approximation -- the problem formulation and a brief survey of applications. The Singular Value Decomposition (SVD) and the Eckart-Young theorem on optimality.
Slides for the first lecture.
Notes for the first week.
Latex template for course notes.
Summary of the "top 10 algorithms" article.
2 (Jan 18)
No class Monday (MLK day).
Power iterations, and Krylov methods. Gram-Schmidt and the QR factorization. How to cheaply get an approximate SVD from a QR factorization.
Notes for the second week.
3 (Jan 25)
Randomized methods for computing low-rank approximations - the basic idea.
Notes: Monday. Wednesday. Friday.
4 (Feb 1) Homework 1 due on Monday.
The files hw01p2.m and hw01p3.m.
Solutions: Problem 1, Problem 2 with code, Problem 3.
Randomized methods for computing low-rank approximations - adaptive rank determination.
Notes: Wednesday. Friday. Notes from Dr. Voronin.
5 (Feb 8)

Randomized methods for computing low-rank approximations - single pass algorithms, error analysis, and the SRFT.
Notes: Monday. Wednesday and Friday.
6 (Feb 15) Homework 2 due on Friday.
The files hw02p1.m and hw02p2.m.
Solutions: Problem 1, Problems 2 and 3.
Connection between randomized methods and Krylov techniques.
Notes: Monday. Wednesday. Friday.
The latest version of the course notes.
7 (Feb 22)
Interpretation of data - the interpolative decomposition (ID) and the CUR decomposition.
Notes: Monday. Wednesday. Friday.
The latest version of the course notes.
8 (Feb 29) Applications of low-rank approximation: Principal Component Analysis (PCA), Latent Semantic Indexing (LSI), eigenfaces, pagerank, potential evaluation.
Notes: Monday, Wednesday.
The latest version of the course notes.
9 (Mar 7) Homework 3 due on Monday.
The file hw03p02.m.
Applications of low-rank approximation: Principal Component Analysis (PCA), Latent Semantic Indexing (LSI), eigenfaces, pagerank, potential evaluation.
Notes: Monday, Friday.
10 (Mar 14)
Techniques for non-linear problems; diffusion geometry.
Notify instructor via email of your choice of project by March 19! Notes: Monday. Wednesday.
Paper on diffusion geometry (local copy).
Spring Break

11 (Mar 28)
Diffusion geometry, continued. Johnson-Lindenstrauss methods; random projections as a tool for dimensionality reduction. Nearest neighbor search.
Slides on k-d trees (local copy).
Tutorial lecture by Peter Indyk.
Johnson-Lindenstrauss notes (local copy).
Notes: Monday and Wednesday, Friday..
12 (Apr 4) Homework 4 due on Friday.
The file testmatrices.mat.
As a solution key, here is the file used to generate the test matrices. Reference solutions: Problem 1, Problem 2, Problem 3.
Johnson-Lindenstrauss theory. Random projections. Randomized nearest neighbor search. Comparison of slowly converging randomized methods (Monte Carlo, JL projections) and accurate randomized methods (RSVD, QuickSort).
Papers by Jones, Osipov, Rokhlin: 1 and 2.
Notes: Monday. Wedesday.
13 (Apr 11)
The Fast Multipole Method.
Notes: Wedesday.
Slides for the Wednesday lecture.
Slides for the Friday lecture.
14 (Apr 18)
The potential evaluation map and kernel independent FMMs.
Clustering and k-mean. (Code for the Matlab demo here.)
Rank-structured matrices and fast matrix inversion.
Slides: Monday, Friday.
15 (Apr 25) Homework 5 due on Friday.
The tutorial FMM.
The file HW05.m.
Reference solutions: Problem 1, Problem 3.
Project presentations.
Slides for the Monday lecture.
Finals week
No final exam, but your written final project report is due on Tuesday May 3, by 4pm.