Basic Information

  • READ: Course description, goals, integrity, knowledge base
  • Course number: 263-0007, 8 credits
  • Spring 2020, lectures: M 10:15-12:00, HG F3; Th 9:15-10:00 HG F3; occasional substitute lectures: W 13:15-15:00 HG D3.2
  • Instructor: Markus Püschel (CAB H69.3, pueschel at inf)
  • Head TAs:
    • Joao Rivera (JR)
    • Bojan Karlas (BK)
  • TAs:
    • Gagandeep Singh (GS)
    • Konstantin Taranov (KT)
    • Theodoros Theodoridis (TT)
    • Eliza Wszola (EW)
    • Eric Stavarache (ES)
    • Konstantinos Triantafyllou (KTr)
    • Jakob Beckmann (JB)
    • Hleb Makarchuk (HM)
  • Mailing lists:
    • For technical questions: (emails to this address go to the lecturer and all TAs)
    • Forum to find project partner: (emails go to all students who have no partner yet and to Head TAs)
  • Office Hours: (No office hours after April 9).
    • Mon 12:30-14:00: Eliza
    • Tue 13:30-15:00: Theodoros
    • Wed 14:00-15:30: Konstantin
    • Fri 12:30-14:00: Joao


  • 40% research project
    • Topic: Very fast, ideally adaptive implementation and performance analysis of a numerical problem
    • Team up in groups of four: register in the project system
    • March 6: find team, find a problem (tip: look at the prior courses linked above for examples)
    • Finding a problem: either pick from the below list (6 teams max per topic) or suggest a project to MP for approval (email MP with paper containing algorithm)
    • Once project is fixed: add it in the project system
    • Complete “milestones” during the semester and enter them in the project system
    • Later in semester: One or two 1 hour one-on-one meetings with a project supervisor
    • Write 8 page standard conference paper (template is provided below)
    • Give short presentation end of semester
  • 30% midterm
  • 30% homework
    • Exercises on code analysis
    • Implementation exercises
      • study the effect of program optimizations, compilers, special instructions, etc.
      • write and submit C code & create runtime/performance plots
    • Some templates will be provided
    • All homeworks are single-student homeworks (read integrity rules)
  • There is no final exam

Research Project

  • All projects have to be registered in our project system. This site contains a rough structure for your project and is also used later for updates.
  • How it works:
    • Weeks without homeworks should be used to work on the project
    • You select a numerical problem and create a correct (tested) implementation in C
    • You determine the arithmetic cost, measure the runtime and performance
    • You profile the implementation to find the parts in which most the runtime spent
    • Focusing on these you apply various optimization techniques from this class
    • You repeat the previous steps to create various versions with (hopefully) continuously better runtime
    • You use (exclusively) a repository that we provide to you
    • You analyze and try to reason about the performance behavior
    • You write a paper about your work and give a presentation
  • Paper:
    • Maximal 8 pages (hard limit) without references, conference style, template and instructions below
    • Everybody reads this: report.pdf
    • Latex source:
    • Due date: Friday, June 12 (in your git repository)
    • Name: (Team ID) + _report.pdf, e.g. 07_report.pdf
  • Presentation
  • Rough timeline
    • Start project work: any time, the earlier the better
    • Assignment project advisor: around mid April
    • One-on-one project meetings: End of April and early May
# Predefined Topics   Supervisor/s
1 Baum-Welch algorithm   ES, HM
2 Fractal image compression   EW
3 Social force model (by Helbing and Molnar)   KTr, JB
4 Approximate Cholesky for Symmetric Diagonally Dominant Matrices   TT
5 Number-theoretic transform   JR
6 Elliptic curve cryptography   KT
ID Proposed Topics Supervisor/s
5 HDR image construction from multi-exposed stereo LDR images GS
6 Algorithms for Arbitrary Precision JR
7 A SIFT Descriptor for Feature Matching KT
14 Volume Estimation of High-Dimensional Convex Bodies GS
18 Spectral Clustering GS
25 local outlier factor TT
28 Arbitrary precision floating point JR
30 Improved SPH method TT
34 Efros and Freeman Image Quilting Algorithm for Texture Synthesis GS
39 XGBoost TT
40 FastSLAM GS
42 SURF: Speeded Up Robust Features KT
43 tSNE vs accelerated tSNE with dual trees TT
45 ADTBoost TT
46 Seam Carving for Content-Aware Image Resizing GS
49 Ray Marching TT
51 Solving Linear Programs in the Current Matrix Multiplication Time GS
52 NNDescent GS
53 Robust Denoising using Feature and Color Information KT
55 Subgraph Isomorphism GS
56 DBScan and Dboost GS
59 A Fast Implementation of Ed25519 ECDSA Scheme KT


New date: July 8. Time: 10-12 (full 2 hours). Arrive: 9:45. Place: several rooms because of distancing that will be communicated in time. The exam will cover the entire lecture material except the part on transforms (which started April 6).

solution, without solution


Late policy: No deadline extensions, but you have 3 late days. You can use at most 2 on one homework. For example, submitting 7 hours late costs one late day.

We will be using Moodle for the homeworks.

Homework Deadline Solution
Homework 0 as soon as possible  
Homework 1 Th March 5th, 5pm Homework 1
Homework 2 Th March 12th, 5pm Homework 2
Homework 3 Th March 26th, 5pm programming exercise auto-assessed
Homework 4 Th April 9th, 5pm Homework 4

Lectures Plan (subject to change)

Starting 9.3., lectures were recorded.

Date Content Other Material
17.02 Course motivation, overview, organization  
20.02 Cost analysis and performance  
24.02 Intel Haswell architecture/microarchitecture, operational intensity Intel Haswell, Intel Optimization Manual, Agner Fog’s instruction tables, see also the recent uops
27.02. Instruction level parallelism  
02.03. Compiler limitation, Benchmarking  
05.03. SIMD vector instructions Intel Intrinsics Guide
09.03. SIMD vector instructions  
12.03. Compiler vectorization  
16.03. Locality, caches  
19.03. Caches, blocking MMM  
23.03. Roofline model, Linear algebra libraries, BLAS, ATLAS  
26.03. Model-based ATLAS  
30.03. Virtual memory optimizations for MMM  
02.04. Sparse linear algebra, sparse MVM  
06.04. Discrete/fast Fourier transform  
20.04. Fast FFT, FFTW library  
27.04. Spiral: DSL-based program generation for performance This was the last lecture. All time for the project now!
25.05. Project presentations (information)  
27.05. Project presentations