Skip to content

04 Data Preparation

Preparation

Chapter 2, page 60-87.

Material

Lecture 4 - Empty

Lecture 4

Python | Visualize missing values

Examples.ipynb

Bag of Words.ipynb

jokes.json

Preparing data.ipynb

Session Description

This lecture will cover the basics of how to prepare a data set for machine learning.

Learning Objectives

  • Know how to approach features and when to drop and/or engineer them for your specific purposes
  • Prepare a dataset for ML. Specifically, you should be able to explain and perform each of the operations below:
  • Handle missing values/NaN-values in appropriate ways
  • Identify and handle outliers, inlcuding using a boxplot
  • Create dummy variables
  • Scale/normalize variables
  • Create a bag-of-words-representation for text-data