Krzysztof Onak > Teaching > Algorithmic Techniques for Taming Big Data (DS-563, Fall 2021)

Algorithmic Techniques for Taming Big Data (DS-563/CS-543, Fall 2021)

Course Description: Growing amounts of available data lead to significant challenges in processing them efficiently. In many cases, it is no longer possible to design feasible algorithms that can freely access the entire data set. Instead of that, we often have to resort to techniques that allow for reducing the amount of data such as sampling, sketching, dimensionality reduction, and core sets. Apart from these approaches, the course will also explore scenarios in which large data sets are distributed across several machines, or even geographical locations, and the goal is to design efficient communication protocols or MapReduce algorithms.

The course will include a final project and programming assignments in which we will explore the performance of our techniques when applied to publicly available data sets. Throughout the course, we will explore various strategies for implementing techniques that have theoretical guarantees in practice.

Instructor: Krzysztof Onak (konak@bu.edu)

Prerequisites

Self–assessment questionnaire: Since advanced courses offered through the Faculty of Computing & Data Sciences are meant to be open to students from various disciplines, we provide an on-line questionnaire to assist students with self–assessment and placement. Please follow this link to access it.

Tentative Syllabus

Course Requirements

Apart from active participation, the class will require solving homework problem sets, two experimental programming assignments, and a final project. The overall grade will be based on the following factors:

Programming Assignments: The course will feature two programming assignments in which students will implement algorithms covered in class and apply them to data sets of their choice. Collaboration here is not allowed (except for discussing high–level ideas), i.e., students are required to implement algorithms and run experiments on their own.

Final Project: Possible final projects ideas include but are not limited to

The outcome of a project will be a short technical report, describing obtained results and conclusions. As opposed to programming assignments, students are allowed to work in teams of two or three. A list of potential projects topics will be provided, but students are encouraged to develop their own ideas. These projects have to be approved by the instructor.

Materials

There is no textbook. A good list of resources—including books, surveys, lectures notes, and presentations—on many topics covered in this class can be found at https://sublinear.info/index.php?title=Resources.