# How to Write Fast Numerical Code - Spring 2019

# Basic Information

- Course description, goals, integrity, knowledge base
- Course number: 263-2300, 6 credits
- Spring 2016, lectures: M 10:15-12:00, HG D3.2; Th 9:15-10:00 CAB G51; occasional substitute lectures: W 13:15-15:00 HG D3.2
**Instructor**: Markus Püschel (CAB H69.3, pueschel at inf, 2-7303)**TAs and their office hours**:- Tyler Smith
- Alen Stojanov
- Gagandeep Singh

**Mailing lists**:- For technical questions: fastcode@lists.inf.ethz.ch (emails to this address go to the lecturer and all TAs)
- Forum to find project partner: fastcode-forum@lists.inf.ethz.ch (emails go to all students who have no partner yet and to Alen & Tyler)

# Grading

**40% research project**- Topic: Very fast, ideally adaptive implementation of a numerical problem
- Team up in groups of four
- March 4: find team, find a problem (tip: look at the prior courses linked above for examples)
- Complete “milestones” during semester and enter them into the online check list
- Write 6 page standard conference paper (template will be provided)
- Give short presentation end of semester

**25% midterm****35% homework**- Exercises on algorithm or code analysis
- Implementation exercises
- study the effect of program optimizations, compilers, special instructions, etc.
- write and submit C code & create runtime/performance plots

- Some templates will be provided
- All homeworks are single-student homeworks

**There is no final Exam**

# Research Project

- All projects have to be registered in our project system. This site is also used later for updates.
- How it works:
*Weeks without homeworks should be used to work on the project*- You select a numerical problem and create a correct (verified) implementation in C
- You determine the arithmetic cost, measure the runtime and performance
- You profile the implementation to find the parts in which most the runtime spent
- Focusing on these you apply various optimization techniques from this class
- You repeat the previous steps to create various versions with (hopefully) continuously better runtime
- You write a paper about your work and give a presentation

- Paper:
**Maximal 6 pages (hard limit)**, conference style, template and instructions below- Everybody reads this: report.pdf
- Latex source: report.zip (start with reading the README file)
- Due date: Friday, June 14 (as final-report.pdf in your svn)

- Presentation
- Last week of classes
- Template (the use is totally optional) and some guidelines (ppt is 2007 and later): presentation-template.pptx, presentation-template.pdf
- The order will be determined randomly right before class
- Who talks will be determined randomly right before class

# | Title | Supervisor/s |
---|---|---|

1 | Hierarchical Density-Based Spatial Clustering | AS |

2 | Belief Propagation for Early Vision | AS |

3 | Metaheuristic Algorithms for Optimization Problems | AS |

4 | SVD and Symmetric EVD Using Jacobi Rotations | TS |

5 | Baum-Welch Algorithm for Hidden Markov Models | MP |

6 | Stereo Matching with PatchMatch | GS |

7 | Forward and Backward Reaching Inverse Kinematics | GS |

8 | Gradient Boosting Decision Tree for Learning to Rank | TS |

9 | Hamilton Segmenter for R peak detection in ECG signals | MP |

10 | Bone Anisotropy Mapping | TS |

11 | Baum-Welch Algorithm | MP |

12 | Density Estimation with Distribution Element Trees | MP |

13 | Grid-based Fluid Simulation | GS |

# Midterm

April 15th, room HG E7 (solution, without solution).

# Homework

**Late policy**: No deadline extensions, but you have 3 late days. You can use at most 2 on one homework. For example, submitting 7 hours late costs one late day.

We will be using Moodle for the homeworks.

Homework | Deadline | Solution |
---|---|---|

Homework 0 | as soon as possible | |

Homework 1 | Th March 7th, 5pm | Homework 1 |

Homework 2 | Th March 14th, 5pm | Homework 2 |

Homework 3 | Th March 28th, 5pm | Homework 3 |

Homework 4 | Th April 11th, 5pm | Homework 4 |

# Lectures (including pdfs)

Date | Content | Other Material |
---|---|---|

18.02 | Course motivation, overview, organization | |

21.02 | Cost analysis and performance | |

25.02. | Intel Haswell architecture/microarchitecture, operational intensity | Intel Haswell, Intel Optimization Manual, Agner Fog’s instruction tables, see also the recent uops |

28.02. | Instruction level parallelism | |

04.03. | Compiler limitation, Benchmarking | |

07.03. | SIMD vector instructions | Intel Intrinsics Guide |

11.03. | SIMD vector instructions | |

14.03. | Locality, caches | |

18.03. | Caches, analysis blocked MMM | |

21.03. | Roofline model, roofline plot | |

25.03. | Dense linear algebra libraries, model-based ATLAS | |

28.03. | Model-based ATLAS | |

01.04. | Virtual memory system | |

04.04. | Memory-bound computation, sparse linear algebra | |

08.04. | Sparse MVM, linear transforms | |

11.04. | Discrete/fast Fourier transform: overview | |

15.04. | Midterm | |

29.04. | FFTW, notes | FFTW website |

06.05. | Computer generation of fast code (Spiral) | Spiral website |

27.05. | Project presentations | |

29.05. | Project presentations |