# Advanced Systems Lab - Spring 2023

# Basic Information

- READ: Course description, prerequisites, goals, integrity
- Read the slides of the first lecture
- FAQs
- Course number: 263-0007, 8 credits
- Lectures: M 10:15-12:00, HG F3; Th 9:15-10:00, HG G3; occasional substitute lectures: W 14:15-16:00 HG E5
- The lectures are not streamed but recorded. Login info is different from nethz and will be sent by email.
**Instructor**: Markus Püschel (CAB H69.3, pueschel at inf)**Head TA**:- Joao Rivera (JR)

**TAs**:- Tommaso Pegolotti (TP)
- Theodoros Theodoridis (TT)
- Mikhail Khalilov (MK)
- Yann Girsberger (YG)

**Mailing lists**:- For technical questions: fastcode@lists.inf.ethz.ch (emails to this address go to the lecturer and all TAs)
- Forum to find project partner: fastcode-forum@lists.inf.ethz.ch (emails go to all students who have no partner yet and to Head TA)

**Office Hours**(zoom link was communicated by email):- Mon 12:30-14:00: Theodoros (zoom)
- Tue 13:30-15:00: Tommaso (CAB J71.6)
- Wed 13:00-14:30: Yann (zoom)
- Fri 14:00-15:30: Joao (CAB J71.6)

# Time Line

This list can be subject to minor changes, which would be announced in a timely manner.

Fr 10.03. | Project team and project registered in the project system; start project anytime now |

Th 09.03. | HW1 due |

Th 16.03. | HW2 due |

Th 30.03 | HW3 due |

Th 20.04. | HW4 due |

Wed 26.04. | Midterm |

week of 01.05 | 1st one-on-one project meeting (minimal milestone: base implementation, tested, performance plot, initial optimization plan) |

week of 22.05. | 2nd one-on-one project meeting |

week of 05.06. | Project presentations |

Fr 23.06. | Project report due |

# Grading

**40% research project**- Topic: Very fast, ideally adaptive implementation and associated performance analysis for a numerical problem
- Team up in groups of four: register in the project system
- March 10th: find team, find a problem (tip: look at prior courses)
- Finding a problem: either pick from the below list (
**max nine teams per topic**) or suggest a project to MP for approval (email MP with paper containing algorithm). If you pick from this list, the decision is final and cannot be changed. - Once project is fixed: add it in the project system to your team
- Complete “milestones” during the semester and enter them in the project system
- Later in semester: One or two 1 hour one-on-one meetings with a project supervisor
- Give short presentation end of semester
- Write 8 page standard conference paper (template is provided below)
**30% midterm****30% homework**- Exercises on code analysis
- Implementation exercises
- study the effect of program optimizations, compilers, special instructions, etc.
- write and submit C code & create runtime/performance plots

- All homeworks are single-student homeworks (read integrity rules)

**There is no final exam**

# Research Project

- All projects have to be registered in our project system. This site contains a rough structure for your project and is also used later for updates.
- How it works:
**Weeks without homeworks should be used to work on the project**- You select a numerical algorithm and create a correct (tested) implementation in C
- You determine the arithmetic cost, measure the runtime and performance
- You profile the implementation to find the parts in which most the runtime spent
- Focusing on these you apply various optimization techniques from this class
- You repeat the previous steps to create various versions with (hopefully) continuously better runtime
- You use (exclusively) a repository that we provide to you
- You analyze and reason about the performance behavior
- You give a presentation and write a short paper about your work

- Paper:
**Maximal 8 pages (hard limit) without references**, conference style, template and instructions below- Everybody reads this: report.pdf
- Latex source: report.zip
- Due date: 23.06 (in your git repository)
- Name: (Team ID) + _report.pdf, e.g. 07_report.pdf

- Presentation
- Week of 05.06 (information)
- Template (the use is totally optional) and some guidelines: presentation-template.pptx, presentation-template.pdf

- Some tips on profiling tools
- Rough timeline
- Start project work: any time, the earlier the better
- Assignment project advisor: around mid April
- One-on-one project meetings: two in May, see above

# Predefined Topics Team IDs 1 Volume estimation 15, 16, 18, 21, 23, 25, 29, 37, 39 2 Image quilting 8, 12, 14, 17, 19, 22, 24, 31, 36 3 Floating-point expansions 4, 6, 7, 10, 13, 20, 27, 35, 41 4 Model of social force 3, 9, 26, 30, 38, 40, 43, 45, 47 5 Matrix exponential 1, 11, 28, 51 6 Worst-case optimal joins 2, 5, 33, 34, 42, 48, 49 ID Proposed Topics 32 Position Based Dynamics 44 Gaussian Belief Propagation. 46 Fast Polynomial-Evaluation MAC 50 The Viterbi Algorithm. 52 Bounding Volume Hierarchy for Ray Tracing. 53 ECDSA on NIST P-256.

# Midterm

Wed, 26.04., 14:15-16:00

**Previous exams**:

- 2015: without solution, with solution
- 2016: without solution, with solution
- 2017: without solution, with solution
- 2019: without solution, with solution
- 2020: without solution, with solution
- 2021: without solution, with solution
- 2022: without solution, with solution

# Homework

**Late policy**: No deadline extensions, but you have 3 late days. You can use at most 2 on one homework. For example, submitting 20 minutes or 7 hours late costs one late day.

We will be using Moodle for the homeworks.

It may help to look at the homeworks of previous iterations of this course.

Homework | Deadline | Solution |
---|---|---|

Homework 0 | as soon as possible | |

Homework 1 | Th March 9th, 5pm | Homework 1 |

Homework 2 | Th March 16th, 5pm | Homework 2 |

Homework 3 | Th March 30th, 5pm | programming exercise auto-assessed |

# Lectures Plan

Date | Content | Other Material |
---|---|---|

20.02 | Course motivation, overview, organization | |

23.02 | Cost analysis and performance | |

27.02 | Intel Skylake architecture/microarchitecture, operational intensity | Intel optimization manual, Section 2.2, Agner Fog’s instruction tables, see also the recent uops |

02.03 | Instruction level parallelism | |

06.03 | Compiler limitations, benchmarking | |

09.03 | SIMD vector instructions, AVX | Intel intrinsics guide |

13.03 | SIMD vector instructions, AVX | |

16.03 | Compiler vectorization | |

20.03 | Locality, caches, blocking MMM | |

23.03 | Roofline model | |

27.03 | Linear algebra libraries, BLAS, ATLAS, Fast MMM | |

30.03 | cancelled | |

03.04 | Fast MMM continued, register renaming, virtual memory | |

06.04 | Sparse linear algebra, sparse MVM | |

17.04 | Discrete/fast Fourier transform | |

24.04 | Fast FFT, FFTW | |

01.05 | Spiral: DSL-based program generation for performance |