Course Description: Growing amounts of available data lead to significant challenges in processing them efficiently. In many cases, it is no longer possible to design feasible algorithms that can freely access the entire data set. Instead of that, we often have to resort to techniques that allow for reducing the amount of data such as sampling, sketching, dimensionality reduction, and core sets. Apart from these approaches, the course will also explore scenarios in which large data sets are distributed across several machines, or even geographical locations, and the goal is to design efficient communication protocols or MapReduce algorithms.
The course will include a final project and programming assignments in which we will explore the performance of our techniques when applied to publicly available data sets. Throughout the course, we will explore various strategies for implementing techniques that have theoretical guarantees in practice.
Syllabus: [pdf]
Instructor: Krzysztof Onak (konak@bu.edu)
Office Hours: Mondays 3–5pm, CCDS 1443 (or the adjacent common area)
Teaching Assistant: Rathin Desai (rathin@bu.edu)
Office Hours: Friday 1–3pm, CCDS 15th floor, yellow southwest corner
Lecture: Tuesday/Thursday 3:30–4:45pm, CDS 264
Discussion sections:
• Wednesday 1:25–2:15pm, IEC B07
• Wednesday 2:30–3:20pm, IEC B07
Piazza (announcements and discussions): https://piazza.com/bu/spring2024/ds563cs543
Gradescope code for submitting homework: 6G4V6G
https://sublinear.info/index.php?title=Resources
.https://www.sketchingbigdata.org
.