4.64 out of 5
4.64
1952 reviews on Udemy

Feature Engineering for Machine Learning

Transform the variables in your data and build better performing machine learning models
Instructor:
Soledad Galli
12,641 students enrolled
English [Auto]
Learn multiple techniques for missing data imputation
Transform categorical variables into numbers while capturing meaningful information
Learn how to deal with infrequent, rare and unseen categories
Transform skewed variables into Gaussian
Convert numerical variables into discrete
Remove outliers from your variables
Extract meaningful features from dates and time variables
Learn techniques used in organisations worldwide and in data competitions
Increase your repertoire of techniques to preprocess data and build more powerful machine learning models

Welcome to Feature Engineering for Machine Learning, the most comprehensive course on feature engineering available online.

In this course, you will learn how to engineer features and build more powerful machine learning models.

Who is this course for?

So, you’ve made your first steps into data science, you know the most commonly used prediction models, you probably built a linear regression or a classification tree model. At this stage you’re probably starting to encounter some challenges – you realize that your data set is dirty, there are lots of values missing, some variables contain labels instead of numbers, others do not meet the assumptions of the models, and on top of everything you wonder whether this is the right way to code things up. And to make things more complicated, you can’t find many consolidated resources about feature engineering. Maybe only blogs? So you may start to wonder: how are things really done in tech companies?

This course will help you! This is the most comprehensive online course in variable engineering. You will learn a huge variety of engineering techniques used worldwide in different organizations and in data science competitions, to clean and transform your data and variables.

What will you learn?

I have put together a fantastic collection of feature engineering techniques, based on scientific articles, white papers, data science competitions, and of course my own experience as a data scientist.

Specifically, you will learn:

  • How to impute your missing data

  • How to encode your categorical variables

  • How to transform your numerical variables so they meet ML model assumptions

  • How to convert your numerical variables into discrete intervals

  • How to remove outliers

  • How to handle date and time variables

  • How to work with different time zones

  • How to handle mixed variables which contain strings and numbers

Throughout the course, you are going to learn multiple techniques for each of the mentioned tasks, and you will learn to implement these techniques in an elegant, efficient, and professional manner, using Python, NumPy, Scikit-learn, pandas and a special open-source package that I created especially for this course: Feature- engine.

At the end of the course, you will be able to implement all your feature engineering steps in a single and elegant pipeline, which will allow you to put your predictive models into production with maximum efficiency.

Want to know more? Read on…

In this course, you will initially become acquainted with the most widely used techniques for variable engineering, followed by more advanced and tailored techniques, which capture information while encoding or transforming your variables. You will also find detailed explanations of the various techniques, their advantages, limitations and underlying assumptions and the best programming practices to implement them in Python.

This comprehensive feature engineering course includes over 100 lectures spanning about 10 hours of video, and ALL topics include hands-on Python code examples which you can use for reference and for practice, and re-use in your own projects.

In addition, the code is updated regularly to keep up with new trends and new Python library releases.

So what are you waiting for? Enroll today, embrace the power of feature engineering and build better machine learning models.

Introduction

1
Introduction

Testing

2
Course curriculum overview
3
Course requirements
4
How to approach this course
5
Setting up your computer
6
Course Material
7
Download Jupyter notebooks
8
Download datasets
9
Download course presentations
10
Moving Forward
11
FAQ: Data Science, Python programming, datasets, presentations and more...

Variable Types

1
Variables | Intro
2
Numerical variables
3
Categorical variables
4
Date and time variables
5
Mixed variables
6
Quiz about variable types

Variable Characteristics

1
Variable characteristics
2
Missing data
3
Cardinality - categorical variables
4
Rare Labels - categorical variables
5
Linear models assumptions
6
Linear model assumptions - additional reading resources (optional)
7
Variable distribution
8
Outliers
9
Variable magnitude
10
Bonus: Machine learning algorithms overview

Table illustrating the advantages and disadvantages of different machine learning algorithms, as well as their requirements in terms of feature engineering, and common applications. 

11
Bonus: Additional reading resources

Missing Data Imputation

1
Introduction to missing data imputation
2
Complete Case Analysis

In this lecture, I describe complete case analysis, what it is, what assumptions it makes, and what are the implications and consequences of handling missing values using this method.

3
Mean or median imputation

In this lecture, I describe what I mean by replacing missing values by the mean or median of the variable, what are the assumptions, advantages and disadvantages, and how they may affect the performance of machine learning algorithms.

4
Arbitrary value imputation
5
End of distribution imputation
6
Frequent category imputation
7
Missing category imputation
8
Random sample imputation

In this lecture, I describe what random sample imputation, the advantages, and the cares that should be taken were this method to be implemented in a business setting.

9
Adding a missing indicator

Here I describe the process of adding one additional binary variable to capture those observations where data is missing. 

10
Mean or median imputation with Scikit-learn
11
Arbitrary value imputation with Scikit-learn
12
Frequent category imputation with Scikit-learn
13
Missing category imputation with Scikit-learn
14
Adding a missing indicator with Scikit-learn
15
Automatic determination of imputation method with Sklearn
16
Introduction to Feature-engine
17
Mean or median imputation with Feature-engine
18
Arbitrary value imputation with Feature-engine
19
End of distribution imputation with Feature-engine
20
Frequent category imputation with Feature-engine
21
Missing category imputation with Feature-engine
22
Random sample imputation with Feature-engine

Continues from previous lecture: in this lecture, I describe what random sample imputation, the advantages, and the cares that should be taken were this method to be implemented in a business setting.

23
Adding a missing indicator with Feature-engine
24
Overview of missing value imputation methods
25
Conclusion: when to use each missing data imputation method

Multivariate Missing Data Imputation

1
Multivariate Imputation
2
KNN Impute
3
KNN Impute - Demo
4
MICE
5
missForest
6
MICE and missForest - Demo
7
Additional Reading resources (Optional)

Categorical Variable Encoding

1
Categorical encoding | Introduction
2
One hot encoding
3
Important: Feature-engine version 1.0.0
4
One-hot-encoding: Demo
5
One hot encoding of top categories
6
One hot encoding of top categories | Demo
7
Ordinal encoding | Label encoding
8
Ordinal encoding | Demo
9
Count or frequency encoding
10
Count encoding | Demo
11
Target guided ordinal encoding
12
Target guided ordinal encoding | Demo
13
Mean encoding
14
Mean encoding | Demo
15
Probability ratio encoding
16
Weight of evidence (WoE)
17
Weight of Evidence | Demo
18
Comparison of categorical variable encoding
19
Rare label encoding

In this lecture I will describe and compare 2 methods commonly used to replace rare labels. Rare labels  are those categories within a categorical variable that contain very few observations, and therefore may affect tree based machine learning algorithm performance. 

In this lecture I will focus on variables with one predominant category.

20
Rare label encoding | Demo

In this lecture I will describe and compare 2 methods commonly used to replace rare labels. Rare labels  are those categories within a categorical variable that contain very few observations, and therefore may affect tree based machine learning algorithm performance. 

In this lecture I will focus on variables with few categories.

21
Binary encoding and feature hashing
22
Summary table of encoding techniques
23
Bonus: Additional reading resources

Variable Transformation

1
Variable Transformation | Introduction
2
Variable Transformation with Numpy and SciPy
3
variable Transformation with Scikit-learn
4
Variable transformation with Feature-engine

Discretisation

1
Discretisation | Introduction
2
Equal-width discretisation
3
Important: Feature-engine v 1.0.0
4
Equal-width discretisation | Demo
5
Equal-frequency discretisation
You can view and review the lecture materials indefinitely, like an on-demand channel.
Definitely! If you have an internet connection, courses on Udemy are available on any device at any time. If you don't have an internet connection, some instructors also let their students download course lectures. That's up to the instructor though, so make sure you get on their good side!
4.6
4.6 out of 5
1952 Ratings

Detailed Rating

Stars 5
1155
Stars 4
599
Stars 3
154
Stars 2
23
Stars 1
20
30-Day Money-Back Guarantee

Includes

10 hours on-demand video
21 articles
Full lifetime access
Access on mobile and TV
Certificate of Completion
Feature Engineering for Machine Learning
Price:
$218.98 $169

Community

For Professionals

For Businesses

We support Sales, Marketing, Account Management and CX professionals. Learn new skills. Share your expertise. Connect with experts. Get inspired.

Community

Partnership Opportunities

Layer 1
samcx.com
Logo
Register New Account
Compare items
  • Total (0)
Compare
0