Basic Information

  • Course description, goals, integrity, knowledge base
  • Course number: 263-2300, 6 credits
  • Spring 2016, lectures: M 10:15-12:00, HG D3.2; Th 9:15-10:00 CAB G51; occasional substitute lectures: W 13:15-15:00 HG D3.2
  • Instructor: Markus Püschel (CAB H69.3, pueschel at inf, 2-7303)
  • TAs and their office hours:
    • Tyler Smith
    • Alen Stojanov
    • Gagandeep Singh
  • Mailing lists:
    • For technical questions: (emails to this address go to the lecturer and all TAs)
    • Forum to find project partner: (emails go to all students who have no partner yet and to Alen & Tyler)


  • 40% research project
    • Topic: Very fast, ideally adaptive implementation of a numerical problem
    • Team up in groups of four
    • March 4: find team, find a problem (tip: look at the prior courses linked above for examples)
    • Complete “milestones” during semester and enter them into the online check list
    • Write 6 page standard conference paper (template will be provided)
    • Give short presentation end of semester
  • 25% midterm
  • 35% homework
    • Exercises on algorithm or code analysis
    • Implementation exercises
      • study the effect of program optimizations, compilers, special instructions, etc.
      • write and submit C code & create runtime/performance plots
    • Some templates will be provided
    • All homeworks are single-student homeworks
  • There is no final Exam

Research Project

  • All projects have to be registered in our project system. This site is also used later for updates.
  • How it works:
    • Weeks without homeworks should be used to work on the project
    • You select a numerical problem and create a correct (verified) implementation in C
    • You determine the arithmetic cost, measure the runtime and performance
    • You profile the implementation to find the parts in which most the runtime spent
    • Focusing on these you apply various optimization techniques from this class
    • You repeat the previous steps to create various versions with (hopefully) continuously better runtime
    • You write a paper about your work and give a presentation
  • Paper:
    • Maximal 6 pages (hard limit), conference style, template and instructions below
    • Everybody reads this: report.pdf
    • Latex source: (start with reading the README file)
    • Due date: Friday, June 14 (as final-report.pdf in your svn)
  • Presentation
    • Last week of classes
    • Template (the use is totally optional) and some guidelines (ppt is 2007 and later): presentation-template.pptx, presentation-template.pdf
    • The order will be determined randomly right before class
    • Who talks will be determined randomly right before class
# Title Supervisor/s
1 Hierarchical Density-Based Spatial Clustering AS
2 Belief Propagation for Early Vision AS
3 Metaheuristic Algorithms for Optimization Problems AS
4 SVD and Symmetric EVD Using Jacobi Rotations TS
5 Baum-Welch Algorithm for Hidden Markov Models MP
6 Stereo Matching with PatchMatch GS
7 Forward and Backward Reaching Inverse Kinematics GS
8 Gradient Boosting Decision Tree for Learning to Rank TS
9 Hamilton Segmenter for R peak detection in ECG signals MP
10 Bone Anisotropy Mapping TS
11 Baum-Welch Algorithm MP
12 Density Estimation with Distribution Element Trees MP
13 Grid-based Fluid Simulation GS


April 15th, room HG E7 (solution, without solution).


Late policy: No deadline extensions, but you have 3 late days. You can use at most 2 on one homework. For example, submitting 7 hours late costs one late day.

We will be using Moodle for the homeworks.

Homework Deadline Solution
Homework 0 as soon as possible  
Homework 1 Th March 7th, 5pm Homework 1
Homework 2 Th March 14th, 5pm Homework 2
Homework 3 Th March 28th, 5pm Homework 3
Homework 4 Th April 11th, 5pm Homework 4

Lectures (including pdfs)

Date Content Other Material
18.02 Course motivation, overview, organization  
21.02 Cost analysis and performance  
25.02. Intel Haswell architecture/microarchitecture, operational intensity Intel Haswell, Intel Optimization Manual, Agner Fog’s instruction tables, see also the recent uops
28.02. Instruction level parallelism  
04.03. Compiler limitation, Benchmarking  
07.03. SIMD vector instructions Intel Intrinsics Guide
11.03. SIMD vector instructions  
14.03. Locality, caches  
18.03. Caches, analysis blocked MMM  
21.03. Roofline model, roofline plot  
25.03. Dense linear algebra libraries, model-based ATLAS  
28.03. Model-based ATLAS  
01.04. Virtual memory system  
04.04. Memory-bound computation, sparse linear algebra  
08.04. Sparse MVM, linear transforms  
11.04. Discrete/fast Fourier transform: overview  
15.04. Midterm  
29.04. FFTW, notes FFTW website
06.05. Computer generation of fast code (Spiral) Spiral website
27.05. Project presentations  
29.05. Project presentations