How to Write Fast Numerical Code - Spring 2019
Basic Information
- Course description, goals, integrity, knowledge base
- Course number: 263-2300, 6 credits
- Spring 2016, lectures: M 10:15-12:00, HG D3.2; Th 9:15-10:00 CAB G51; occasional substitute lectures: W 13:15-15:00 HG D3.2
- Instructor: Markus Püschel (CAB H69.3, pueschel at inf, 2-7303)
- TAs and their office hours:
- Tyler Smith
- Alen Stojanov
- Gagandeep Singh
- Mailing lists:
- For technical questions: fastcode@lists.inf.ethz.ch (emails to this address go to the lecturer and all TAs)
- Forum to find project partner: fastcode-forum@lists.inf.ethz.ch (emails go to all students who have no partner yet and to Alen & Tyler)
Grading
- 40% research project
- Topic: Very fast, ideally adaptive implementation of a numerical problem
- Team up in groups of four
- March 4: find team, find a problem (tip: look at the prior courses linked above for examples)
- Complete “milestones” during semester and enter them into the online check list
- Write 6 page standard conference paper (template will be provided)
- Give short presentation end of semester
- 25% midterm
- 35% homework
- Exercises on algorithm or code analysis
- Implementation exercises
- study the effect of program optimizations, compilers, special instructions, etc.
- write and submit C code & create runtime/performance plots
- Some templates will be provided
- All homeworks are single-student homeworks
- There is no final Exam
Research Project
- All projects have to be registered in our project system. This site is also used later for updates.
- How it works:
- Weeks without homeworks should be used to work on the project
- You select a numerical problem and create a correct (verified) implementation in C
- You determine the arithmetic cost, measure the runtime and performance
- You profile the implementation to find the parts in which most the runtime spent
- Focusing on these you apply various optimization techniques from this class
- You repeat the previous steps to create various versions with (hopefully) continuously better runtime
- You write a paper about your work and give a presentation
- Paper:
- Maximal 6 pages (hard limit), conference style, template and instructions below
- Everybody reads this: report.pdf
- Latex source: report.zip (start with reading the README file)
- Due date: Friday, June 14 (as final-report.pdf in your svn)
- Presentation
- Last week of classes
- Template (the use is totally optional) and some guidelines (ppt is 2007 and later): presentation-template.pptx, presentation-template.pdf
- The order will be determined randomly right before class
- Who talks will be determined randomly right before class
# | Title | Supervisor/s |
---|---|---|
1 | Hierarchical Density-Based Spatial Clustering | AS |
2 | Belief Propagation for Early Vision | AS |
3 | Metaheuristic Algorithms for Optimization Problems | AS |
4 | SVD and Symmetric EVD Using Jacobi Rotations | TS |
5 | Baum-Welch Algorithm for Hidden Markov Models | MP |
6 | Stereo Matching with PatchMatch | GS |
7 | Forward and Backward Reaching Inverse Kinematics | GS |
8 | Gradient Boosting Decision Tree for Learning to Rank | TS |
9 | Hamilton Segmenter for R peak detection in ECG signals | MP |
10 | Bone Anisotropy Mapping | TS |
11 | Baum-Welch Algorithm | MP |
12 | Density Estimation with Distribution Element Trees | MP |
13 | Grid-based Fluid Simulation | GS |
Midterm
April 15th, room HG E7 (solution, without solution).
Homework
Late policy: No deadline extensions, but you have 3 late days. You can use at most 2 on one homework. For example, submitting 7 hours late costs one late day.
We will be using Moodle for the homeworks.
Homework | Deadline | Solution |
---|---|---|
Homework 0 | as soon as possible | |
Homework 1 | Th March 7th, 5pm | Homework 1 |
Homework 2 | Th March 14th, 5pm | Homework 2 |
Homework 3 | Th March 28th, 5pm | Homework 3 |
Homework 4 | Th April 11th, 5pm | Homework 4 |
Lectures (including pdfs)
Date | Content | Other Material |
---|---|---|
18.02 | Course motivation, overview, organization | |
21.02 | Cost analysis and performance | |
25.02. | Intel Haswell architecture/microarchitecture, operational intensity | Intel Haswell, Intel Optimization Manual, Agner Fog’s instruction tables, see also the recent uops |
28.02. | Instruction level parallelism | |
04.03. | Compiler limitation, Benchmarking | |
07.03. | SIMD vector instructions | Intel Intrinsics Guide |
11.03. | SIMD vector instructions | |
14.03. | Locality, caches | |
18.03. | Caches, analysis blocked MMM | |
21.03. | Roofline model, roofline plot | |
25.03. | Dense linear algebra libraries, model-based ATLAS | |
28.03. | Model-based ATLAS | |
01.04. | Virtual memory system | |
04.04. | Memory-bound computation, sparse linear algebra | |
08.04. | Sparse MVM, linear transforms | |
11.04. | Discrete/fast Fourier transform: overview | |
15.04. | Midterm | |
29.04. | FFTW, notes | FFTW website |
06.05. | Computer generation of fast code (Spiral) | Spiral website |
27.05. | Project presentations | |
29.05. | Project presentations |