Basic Information

  • READ: Course description, prerequisites, goals, integrity
  • READ: FAQs
  • Read the slides of the first lecture
  • Course number: 263-0007, 8 credits
  • Lectures: M 10:15-12:00, HG F3; Th 9:15-10:00, HG G3; occasional substitute lectures: W 14:15-16:00 HG E5
  • The lectures are not streamed but recorded, available here (same evening or next morning)
  • Instructor: Markus Püschel (CAB H69.3, pueschel at inf)
  • Head TA:
    • Tommaso Pegolotti (TP)
  • TAs:
    • Hicham Leghettas (HL)
    • Emil Schätzle (ES)
    • Lorenzo Paleari (LP)
    • Jonáš Fiala (JF)
    • Hang Hu (HH)
    • Moritz Lumme (ML)
  • Mailing lists:
    • For technical questions: fastcode@lists.inf.ethz.ch (emails to this address go to the lecturer and all TAs)
    • Forum to find project partner: fastcode-forum@lists.inf.ethz.ch (emails go to all students who have no partner yet and to Head TA)
  • Office Hours (during HW period):
    • Mon 14:00-15:30: Lorenzo (Zoom)
    • Tue 12:30-14:00: Emil (CAB J71.6)
    • Wed 14:00-15:30: Hicham (CAB J71.6)
    • Fri 10.00-11:30: Tommaso (Zoom)

Time Line

This list can be subject to minor changes, which would be announced in a timely manner.

Fr 06.03. Project team and project registered in the project system; start project anytime now
Th 05.03. HW1 due
Th 12.03. HW2 due
Th 26.03. HW3 due
Th 16.04. HW4 due
22.04. Midterm
week of 27.04. 1st one-on-one project meeting (minimal milestone: base implementation done, tested, performance plot, initial optimization plan, explain how you plan to divide the optimization work)
week of 18.05. 2nd one-on-one project meeting
week of 01.06. Project presentations
Fr 20.06. Project report due

Grading

Research Project

  • Rough structure for the project; more details in the milestones in the project sstem,
  • How it works:
    • Weeks without homeworks should be used to work on the project
    • You create a correct (tested) implementation in C/C++
    • You determine the arithmetic cost, measure the runtime and performance
    • You profile the implementation to find the parts in which most the runtime spent
    • Focusing on these you apply various optimization techniques from the course to create subsequent faster versions of the code
    • You use (exclusively) the git repository that we provide to you
    • You analyze and reason about the performance behavior
    • You give a presentation and write a short paper about your work
  • Paper:
    • Maximal 7 pages (hard limit) without references, conference style, template and instructions below
    • Everybody reads this: report.pdf
    • Latex source: report.zip
    • Due date: 20.06 (in your git repository)
    • Name: (Team ID) + _report.pdf, e.g. 07_report.pdf
  • Presentation
  • Some tips on profiling tools
  • Rough timeline
    • Start project work: any time, the earlier the better
    • One-on-one project meetings: end of April and May, see above

Midterm

22.04

  • All the material up to then is fair game but the overwhelming part will be what was covered in the homeworks
  • You can study previous exams below
  • No books, notes, laptops, cell phones, or other electronic devices are allowed.
  • A dedicated calculator is allowed (i.e., not the one in your cell phone).
  • Assignment of the rooms is based on your legi number as registered in the system: Info will be put here before the exam

Previous exams:

Homework

Late policy: No deadline extensions, but you have 3 late days. You can use at most 2 on one homework. For example, submitting 20 minutes or 7 hours late costs one late day.

We will be using Moodle for the homeworks.

It may help to look at the homeworks of previous iterations of this course.

Homework Deadline Solution
Homework 0 as soon as possible  
Homework 1 Th March 5th, 5pm  

Lectures Plan (subject to minor changes)

Date Content Other Material
16.02 Course motivation, overview, organization  
19.02 Cost analysis and performance  
23.02 Intel Skylake architecture/microarchitecture, operational intensity Intel earlier generations (Skylake, Haswell, etc), Sec. 7, Intel Ice Lake (Tiger Lake), Intel Golden Cove (Alder Lake), Agner Fog’s instruction tables (up to Tiger Lake, and Zen 4), see also uops (up to Alder-Lake, and Zen 4). For Apple ARM CPUs check out Dougall Johnson blogpost (M1), and the following semester project (M3).
26.02 Instruction level parallelism  
02.03 Compiler limitations, benchmarking  
05.03 SIMD vector instructions, AVX Intel intrinsics guide
09.03 SIMD vector instructions, AVX  
12.03 Compiler vectorization  
16.03 Locality, caches, blocking MMM  
19.03 Roofline model  
23.03 Linear algebra libraries, BLAS, ATLAS, Fast MMM  
26.03 Fast MMM continued, register renaming, virtual memory Comments on working set for TLB
30.03 Rest virtual memory and linear algebra, Sparse linear algebra, sparse MVM  
13.04 Discrete/fast Fourier transform  
16.04 Fast FFT, FFTW  
20.04 Spiral: DSL-based program generation for performance