Basic Information

  • COVID-19 info:
    • We follow the general ETH regulations, which may be updated occasionally
    • Lectures are done physically, streamed live (click through Zentrum-HG-F3), and recorded
  • READ: Course description, prerequisites, goals, integrity
  • Read the slides of the first lecture
  • FAQs
  • Course number: 263-0007, 8 credits
  • Spring 2022, lectures: M 10:15-12:00, HG F3; Th 9:15-10:00 HG F3; occasional substitute lectures: W 14:15-16:00 ETF C1
  • Instructor: Markus Püschel (CAB H69.3, pueschel at inf), Ce Zhang (ce.zhang at inf)
  • Head TA:
    • Joao Rivera (JR)
  • TAs:
    • Tommaso Pegolotti (TP)
    • Konstantin Taranov (KT)
    • Theodoros Theodoridis (TT)
  • Mailing lists:
    • For technical questions: (emails to this address go to the lecturer and all TAs)
    • Forum to find project partner: (emails go to all students who have no partner yet and to Head TA)
  • Office Hours (zoom link was communicated by email):
    • Mon 12:30-14:00: Theodoros
    • Tue 13:30-15:00: Konstantin
    • Wed 14:00-15:30: Tommaso
    • Fri 12:30-14:00: Joao

Time Line

This list can be subject to minor changes, which would be announced in a timely manner.

Fr 11.03. Project team and project registered in the project system; start project anytime now
Th 10.03. HW1 due
Th 17.03. HW2 due
Th 31.03 HW3 due
Th 14.04. HW4 due
Wed 27.04. Midterm
week of 02.05 1st one-on-one project meeting (minimal milestone: base implementation, tested, performance plot, initial optimization plan)
week of 23.05. 2nd one-on-one project meeting
week of 06.06. Project presentations
Fr 24.06. Project report due


Research Project

  • All projects have to be registered in our project system. This site contains a rough structure for your project and is also used later for updates.
  • How it works:
    • Weeks without homeworks should be used to work on the project
    • You select a numerical algorithm and create a correct (tested) implementation in C
    • You determine the arithmetic cost, measure the runtime and performance
    • You profile the implementation to find the parts in which most the runtime spent
    • Focusing on these you apply various optimization techniques from this class
    • You repeat the previous steps to create various versions with (hopefully) continuously better runtime
    • You use (exclusively) a repository that we provide to you
    • You analyze and reason about the performance behavior
    • You give a presentation and write a short paper about your work
  • Paper:
    • Maximal 8 pages (hard limit) without references, conference style, template and instructions below
    • Everybody reads this: report.pdf
    • Latex source:
    • Due date: 24.06 (in your git repository)
    • Name: (Team ID) + _report.pdf, e.g. 07_report.pdf
  • Presentation
  • Some tips on profiling tools
  • Rough timeline
    • Start project work: any time, the earlier the better
    • Assignment project advisor: around mid April
    • One-on-one project meetings: two in May, see above
    # Predefined Topics Team IDs Supervisor/s
    1 Belief propagation for recommender systems 41, 43, 44 CZ
    2 Corner detection 9, 21, 29, 37, 39, 40 KT
    3 Data valuation 3, 12, 16, 18, 24, 28, 36 CZ
    4 Generalized Floyd-Warshall 7, 15, 17, 20, 23, 26 TP
    5 Non-negative matrix factorization 1, 4, 6, 25, 30 TT
    6 Particle swarm with radial basis functions 5, 13, 19, 22, 31, 34 TP
    7 Triangle listing 2, 8, 10, 11, 14 JR
    ID Proposed Topics Supervisor/s
    27 Elliptic Curve Diffie Hellman on Curve25519 KT
    32 Global Illumination using Photon Maps TT
    33 Comparison of Edmonds-Karp and Push-Relabel Maxflow KT
    35 Stochastic L-BFGS TT
    38 K-D trees for motion planning CZ
    42 High-performance Implementation of Marching Cubes Algorithm TT


Wed, 27.04., 14:15-16:00

  • all the material up to then is fair game but the overwhelming part will be what was covered in the homeworks
  • you can study previous exams below
  • no books, notes, calculators, laptops, cell phones, or other electronic devices are allowed
  • assignment of the rooms is based on the first letters of your last name as registered in the system. For example, Thomas Haas goes to room HG E7.
    • Ab-Gy: HG E5
    • Ha-Pu: HG E7
    • Ra-Zu: ETF C1

Previous exams:


Late policy: No deadline extensions, but you have 3 late days. You can use at most 2 on one homework. For example, submitting 20 minutes or 7 hours late costs one late day.

We will be using Moodle for the homeworks.

It may help to look at the homeworks of previous iterations of this course.

Homework Deadline Solution
Homework 0 as soon as possible  
Homework 1 Th March 10th, 5pm Homework 1
Homework 2 Th March 17th, 5pm Homework 2
Homework 3 Th March 31th, 5pm programming exercise auto-assessed
Homework 4 Th April 14th, 5pm Homework 4

Lectures Plan

Date Content Other Material
21.02 Course motivation, overview, organization  
24.02 Cost analysis and performance  
28.02 Intel Skylake architecture/microarchitecture, operational intensity Intel optimization manual, Section 2.2
03.03 Instruction level parallelism  
07.03 Compiler limitations, benchmarking  
10.03 SIMD vector instructions, AVX Intel intrinsics guide, Agner Fog’s instruction tables, see also the recent uops
14.03 SIMD vector instructions, AVX  
17.03 Compiler vectorization  
21.03 Locality, caches  
24.03 Caches, blocking MMM  
28.03 Roofline model, Linear algebra libraries, BLAS, ATLAS  
31.03 Fast MMM  
04.04 Fast MMM continued, register renaming, virtual memory  
07.04 Sparse linear algebra, sparse MVM  
11.04 Discrete/fast Fourier transform  
25.04 Fast FFT, FFTW  
01.05 Spiral: DSL-based program generation for performance