04 Data Preparation
Preparation¶
Chapter 2, page 60-87.
Material¶
Python | Visualize missing values
Session Description¶
This lecture will cover the basics of how to prepare a data set for machine learning.
Learning Objectives¶
- Know how to approach features and when to drop and/or engineer them for your specific purposes
- Prepare a dataset for ML. Specifically, you should be able to explain and perform each of the operations below:
- Handle missing values/NaN-values in appropriate ways
- Identify and handle outliers, inlcuding using a boxplot
- Create dummy variables
- Scale/normalize variables
- Create a bag-of-words-representation for text-data