Basic Information

  • READ: Course description, prerequisites, goals, integrity
  • Read the slides of the first lecture
  • FAQs
  • Course number: 263-0007, 8 credits
  • Lectures: M 10:15-12:00, HG F3; Th 9:15-10:00, HG G3; occasional substitute lectures: W 14:15-16:00 HG E5
  • The lectures are not streamed but recorded. Login info is different from nethz and will be sent by email.
  • Instructor: Markus Püschel (CAB H69.3, pueschel at inf)
  • Head TA:
    • Joao Rivera (JR)
  • TAs:
    • Tommaso Pegolotti (TP)
    • Theodoros Theodoridis (TT)
    • Mikhail Khalilov (MK)
    • Yann Girsberger (YG)
  • Mailing lists:
    • For technical questions: (emails to this address go to the lecturer and all TAs)
    • Forum to find project partner: (emails go to all students who have no partner yet and to Head TA)
  • Office Hours: (No office hours after April 21)
    • Mon 12:30-14:00: Theodoros (zoom)
    • Tue 13:30-15:00: Tommaso (CAB J71.6)
    • Wed 13:00-14:30: Yann (zoom)
    • Fri 14:00-15:30: Joao (CAB J71.6)

Time Line

This list can be subject to minor changes, which would be announced in a timely manner.

Fr 10.03. Project team and project registered in the project system; start project anytime now
Th 09.03. HW1 due
Th 16.03. HW2 due
Th 30.03 HW3 due
Th 20.04. HW4 due
Wed 26.04. Midterm
week of 01.05 1st one-on-one project meeting (minimal milestone: base implementation, tested, performance plot, initial optimization plan)
week of 22.05. 2nd one-on-one project meeting
week of 05.06. Project presentations
Fr 23.06. Project report due


  • 40% research project
  • Topic: Very fast, ideally adaptive implementation and associated performance analysis for a numerical problem
  • Team up in groups of four: register in the project system
  • March 10th: find team, find a problem (tip: look at prior courses)
  • Finding a problem: either pick from the below list (max nine teams per topic) or suggest a project to MP for approval (email MP with paper containing algorithm). If you pick from this list, the decision is final and cannot be changed.
    1. Volume estimation (9)
    2. Image quilting (9)
    3. Floating-point expansions (9)
    4. Model of social force (9)
    5. Matrix exponential (9)
    6. Worst-case optimal joins (9)
  • Once project is fixed: add it in the project system to your team
  • Complete “milestones” during the semester and enter them in the project system
  • Later in semester: One or two 1 hour one-on-one meetings with a project supervisor
  • Give short presentation end of semester
  • Write 8 page standard conference paper (template is provided below)
  • 30% midterm
  • 30% homework
    • Exercises on code analysis
    • Implementation exercises
      • study the effect of program optimizations, compilers, special instructions, etc.
      • write and submit C code & create runtime/performance plots
    • All homeworks are single-student homeworks (read integrity rules)
  • There is no final exam

Research Project

  • All projects have to be registered in our project system. This site contains a rough structure for your project and is also used later for updates.
  • How it works:
    • Weeks without homeworks should be used to work on the project
    • You select a numerical algorithm and create a correct (tested) implementation in C
    • You determine the arithmetic cost, measure the runtime and performance
    • You profile the implementation to find the parts in which most the runtime spent
    • Focusing on these you apply various optimization techniques from this class
    • You repeat the previous steps to create various versions with (hopefully) continuously better runtime
    • You use (exclusively) a repository that we provide to you
    • You analyze and reason about the performance behavior
    • You give a presentation and write a short paper about your work
  • Paper:
    • Maximal 8 pages (hard limit) without references, conference style, template and instructions below
    • Everybody reads this: report.pdf
    • Latex source:
    • Due date: 23.06 (in your git repository)
    • Name: (Team ID) + _report.pdf, e.g. 07_report.pdf
  • Presentation
  • Some tips on profiling tools
  • Rough timeline
    • Start project work: any time, the earlier the better
    • Assignment project advisor: around mid April
    • One-on-one project meetings: two in May, see above
    # Predefined Topics Team IDs Supervisor
    1 Volume estimation 15, 16, 18, 21, 23, 25, 29, 37, 39 YG
    2 Image quilting 8, 12, 14, 17, 19, 22, 24, 31, 36 TT
    3 Floating-point expansions 4, 6, 7, 10, 13, 20, 27, 35, 41 JR
    4 Model of social force 3, 9, 26, 30, 38, 40, 43, 45, 47 MK
    5 Matrix exponential 1, 5, 11, 28, 51 TP
    6 Worst-case optimal joins 2, 33, 34, 42, 48, 49 CZ
    ID Proposed Topics Supervisor
    32 Position Based Dynamics TP
    44 Gaussian Belief Propagation. YG
    46 Fast Polynomial-Evaluation MAC TP
    50 The Viterbi Algorithm. TP
    52 Bounding Volume Hierarchy for Ray Tracing. TT
    53 ECDSA on NIST P-256. TP


Wed, 26.04., 14:15-16:00. Rooms: HG E3, HG E5, HG E7

  • all the material up to then is fair game but the overwhelming part will be what was covered in the homeworks
  • you can study previous exams below
  • no books, notes, calculators, laptops, cell phones, or other electronic devices are allowed
  • assignment of the rooms is based on the first letters of your last name as registered in the system. For example, Peter Gutmann goes to room HG E5.
    • Ab-Go: HG E3
    • Gu-Mu: HG E5
    • Na-Zu: HG E7

Previous exams:


Late policy: No deadline extensions, but you have 3 late days. You can use at most 2 on one homework. For example, submitting 20 minutes or 7 hours late costs one late day.

We will be using Moodle for the homeworks.

It may help to look at the homeworks of previous iterations of this course.

Homework Deadline Solution
Homework 0 as soon as possible  
Homework 1 Th March 9th, 5pm Homework 1
Homework 2 Th March 16th, 5pm Homework 2
Homework 3 Th March 30th, 5pm programming exercise auto-assessed
Homework 4 Th April 20th, 5pm Homework 4

Lectures Plan

Date Content Other Material
20.02 Course motivation, overview, organization  
23.02 Cost analysis and performance  
27.02 Intel Skylake architecture/microarchitecture, operational intensity Intel optimization manual, Section 2.2, Agner Fog’s instruction tables, see also the recent uops
02.03 Instruction level parallelism  
06.03 Compiler limitations, benchmarking  
09.03 SIMD vector instructions, AVX Intel intrinsics guide
13.03 SIMD vector instructions, AVX  
16.03 Compiler vectorization  
20.03 Locality, caches, blocking MMM  
23.03 Roofline model  
27.03 Linear algebra libraries, BLAS, ATLAS, Fast MMM  
30.03 cancelled  
03.04 Fast MMM continued, register renaming, virtual memory Comments on working set for TLB
06.04 Sparse linear algebra, sparse MVM  
17.04 Discrete/fast Fourier transform  
24.04 Fast FFT, FFTW  
08.05 Spiral: DSL-based program generation for performance