Krzysztof Onak > Teaching > Programming for Data Science (DS-210, Spring 2022)

Programming for Data Science (DS-210, Spring 2022)

Instructor: Krzysztof Onak (konak@bu.edu)
Office Hours: Wednesday 4:30–6:30pm, MCS 138N (or an adjacent common area)

Teaching Fellow: Vedaant Tiwari (vedaant@bu.edu)
Office Hours: Monday 3:45–5:45pm, MCS B51

Lecture: Monday/Wednesday/Thursday 12:20–1:10pm, MCS B37
Discussion sections: Wednesday 2:30–3:20pm & 3:35–4:25pm, CGS 111B

Piazza (announcements and discussions): https://piazza.com/bu/spring2022/ds210/home
Gradescope code for submitting homework: 3Y85PZ

Important dates

Assignments

Schedule

1/21
Class overview. Survey. Basics of data analysis.

(Simplified) slides: [pdf]
Source/all materials: [tar.xz] [zip]
1/241/261/28
Select data science tools important for this class. Supervised vs. unsupervised learning. Decision trees.

(Simplified) slides: [pdf]
Source/all materials: [tar.xz] [zip]
Decision trees.

(Simplified) slides: [pdf]
Source/all materials: [tar.xz] [zip]

Assignments: HW 1 out
Sample predictive data analysis pipeline.

(Simplified) slides: [pdf]
Source/all materials: [tar.xz] [zip]
1/312/022/04
Classification vs. regression. Panda dtypes. Ethics of data processing.

(Simplified) slides: [pdf]
Source/all materials: [tar.xz] [zip]
Ethics of data processing (continued). Final project discussion. SciPy. Interpolation in SciPy.

(Simplified) slides: [pdf]
Source/all materials: [tar.xz] [zip]

Assignments: HW 1 due, HW 2 out
Clustering. $k$–means with SciPy.

(Simplified) slides: [pdf]
Source/all materials: [tar.xz] [zip]
2/072/092/11
LECTURE CANCELLEDOptimization. Linear programming.

(Simplified) slides: [pdf]
Source/all materials: [tar.xz] [zip]

Assignments: HW 2 due, HW 3 out
Linear regression and its generalizations.

(Simplified) slides: [pdf]
Source/all materials: [tar.xz] [zip]
2/142/162/18
Loss functions. Measuring errors for regression.

(Simplified) slides: [pdf]
Source/all materials: [tar.xz] [zip]
Overfitting and underfitting. Bias and variance

(Simplified) slides: [pdf]
Source/all materials: [tar.xz] [zip]

Assignments: HW 3 due, HW 4 out
Cross-validation. Hyperparameter tuning.

(Simplified) slides: [pdf]
Source/all materials: [tar.xz] [zip]
Tuesday 2/22 (Monday schedule)2/232/25
Documentation generation in Python. Version control.

(Simplified) slides: [pdf]
Source/all materials: [tar.xz] [zip]
Various features of programming languages.

(Simplified) slides: [pdf]
Source/all materials: [tar.xz] [zip]

Assignments: HW 4 due, HW 5 out
Basics of Rust: variables, data types, and compiling

(Simplified) slides: [pdf]
Source/all materials: [tar.xz] [zip]
2/283/023/04
MIDTERMRust: project manager cargo, functions, flow control, and arrays.

(Simplified) slides: [pdf]
Source/all materials: [tar.xz] [zip]

Assignments: HW 5 due, HW 6 out
Rust: flow control (continued), tuples, enums, and algebraic data types.

(Simplified) slides: [pdf]
Source/all materials: [tar.xz] [zip]
3/07–3/11
SPRING BREAK
3/143/163/18
Structs. Memory management. Stack vs. heap.

(Simplified) slides: [pdf]
Source/all materials: [tar.xz] [zip]
Pitfalls of manual heap management. The Rust way: ownership and borrowing. Methods in Rust.

(Simplified) slides: [pdf]
Source/all materials: [tar.xz] [zip]

Assignments: HW 6 due, HW 7 out
Rust: Copying instead of moving. Multiple references. Generics and generic data types.

(Simplified) slides: [pdf]
Source/all materials: [tar.xz] [zip]
3/213/233/25
Rust: Useful predefined generic data types. Traits.

(Simplified) slides: [pdf]
Source/all materials: [tar.xz] [zip]

Related reading materials: [6.1] [10.1] [10.2]
Rust: Collections. Vectors.

(Simplified) slides: [pdf]
Source/all materials: [tar.xz] [zip]

Related reading materials: [8.1]

Assignments: HW 7 due, HW 8 out
Memory management in vectors. Amortization. Rust: hash maps

(Simplified) slides: [pdf]
Source/all materials: [tar.xz] [zip]

Related reading materials: [8.1] [8.3]

Assignments: final project proposal due
3/283/304/01
Implementing hash maps. Rust: hash maps with custom types and hash sets. Graph representations.

(Simplified) slides: [pdf]
Source/all materials: [tar.xz] [zip]
Implementing graph representations in Rust. Simple graph algorithms. Rust: modules.

(Simplified) slides: [pdf]
Source/all materials: [tar.xz] [zip]

Related reading materials: [7]

Assignments: HW 8 due, HW 9 out
Rust: modules.

(Simplified) slides: [pdf]
Source/all materials: [tar.xz] [zip]

Related reading materials: [7]
4/044/064/08
Rust: splitting modules into separate files and using external crates. Generating random numbers with rand.

(Simplified) slides: [pdf]
Source/all materials: [tar.xz] [zip]

Related reading materials: [7.5] [RBE: File hierarchy] [RBE: Dependencies]
External crate for parsing CSV in Rust. Stacks and queues. Graph exploration.

(Simplified) slides: [pdf]
Source/all materials: [tar.xz] [zip]
Graph exploration: breadth–first search and depth–first search.

(Simplified) slides: [pdf]
Source/all materials: [tar.xz] [zip]
4/114/134/15
Code formatting (rustfmt). Priority queues. Binary heap.

(Simplified) slides: [pdf]
Source/all materials: [tar.xz] [zip]
Applications of priority queues: sorting and shortest paths (Dijkstra's algorithm). Rust: slices.

(Simplified) slides: [pdf]
Source/all materials: [tar.xz] [zip]

Related reading materials: [4.3]

Assignments: HW 9 due, HW 10 out
Rust: String and &str, lifetimes.

(Simplified) slides: [pdf]
Source/all materials: [tar.xz] [zip]

Related reading materials: [8.2] [10.3] [RBE: Strings]
Wednesday 4/20 (Monday schedule)4/22
Rust: closures (anonymous functions) and iterators.

(Simplified) slides: [pdf]
Source/all materials: [tar.xz] [zip]

Related reading materials: [13]
Binary search trees. Rust: BTreeSet and BTreeMap

(Simplified) slides: [pdf]
Source/all materials: [tar.xz] [zip]

Assignments: HW 10 due
4/254/274/29
Error handling in Rust. Dynamic programming.

(Simplified) slides: [pdf]
Source/all materials: [tar.xz] [zip]
Greedy algorithms. Divide and conquer.

(Simplified) slides: [pdf]
Source/all materials: [tar.xz] [zip]
Multithreading and its challenges. Crate rayon.

(Simplified) slides: [pdf]
Source/all materials: [tar.xz] [zip]
5/025/04
Calling Rust code from Python.

(Simplified) slides: [pdf]
Source/all materials: [tar.xz] [zip]
Compiling Rust to WebAssembly. Fast Fibonacci computation via exponentiation by squaring.

(Simplified) slides: [pdf]
Source/all materials: [tar.xz] [zip]
5/13
FINAL (12–2PM)

Materials

Internal:

External: