Welcome to Feature Engineering for Machine Learning, the most comprehensive course on feature engineering available online.
In this course, you will learn how to engineer features and build more powerful machine learning models.
Who is this course for?
So, you’ve made your first steps into data science, you know the most commonly used prediction models, you probably built a linear regression or a classification tree model. At this stage you’re probably starting to encounter some challenges – you realize that your data set is dirty, there are lots of values missing, some variables contain labels instead of numbers, others do not meet the assumptions of the models, and on top of everything you wonder whether this is the right way to code things up. And to make things more complicated, you can’t find many consolidated resources about feature engineering. Maybe only blogs? So you may start to wonder: how are things really done in tech companies?
This course will help you! This is the most comprehensive online course in variable engineering. You will learn a huge variety of engineering techniques used worldwide in different organizations and in data science competitions, to clean and transform your data and variables.
What will you learn?
I have put together a fantastic collection of feature engineering techniques, based on scientific articles, white papers, data science competitions, and of course my own experience as a data scientist.
Specifically, you will learn:
How to impute your missing data
How to encode your categorical variables
How to transform your numerical variables so they meet ML model assumptions
How to convert your numerical variables into discrete intervals
How to remove outliers
How to handle date and time variables
How to work with different time zones
How to handle mixed variables which contain strings and numbers
Throughout the course, you are going to learn multiple techniques for each of the mentioned tasks, and you will learn to implement these techniques in an elegant, efficient, and professional manner, using Python, NumPy, Scikit-learn, pandas and a special open-source package that I created especially for this course: Feature- engine.
At the end of the course, you will be able to implement all your feature engineering steps in a single and elegant pipeline, which will allow you to put your predictive models into production with maximum efficiency.
Want to know more? Read on…
In this course, you will initially become acquainted with the most widely used techniques for variable engineering, followed by more advanced and tailored techniques, which capture information while encoding or transforming your variables. You will also find detailed explanations of the various techniques, their advantages, limitations and underlying assumptions and the best programming practices to implement them in Python.
This comprehensive feature engineering course includes over 100 lectures spanning about 10 hours of video, and ALL topics include hands-on Python code examples which you can use for reference and for practice, and re-use in your own projects.
In addition, the code is updated regularly to keep up with new trends and new Python library releases.
So what are you waiting for? Enroll today, embrace the power of feature engineering and build better machine learning models.
Testing
Table illustrating the advantages and disadvantages of different machine learning algorithms, as well as their requirements in terms of feature engineering, and common applications.
In this lecture, I describe complete case analysis, what it is, what assumptions it makes, and what are the implications and consequences of handling missing values using this method.
In this lecture, I describe what I mean by replacing missing values by the mean or median of the variable, what are the assumptions, advantages and disadvantages, and how they may affect the performance of machine learning algorithms.
In this lecture, I describe what random sample imputation, the advantages, and the cares that should be taken were this method to be implemented in a business setting.
Here I describe the process of adding one additional binary variable to capture those observations where data is missing.
Continues from previous lecture: in this lecture, I describe what random sample imputation, the advantages, and the cares that should be taken were this method to be implemented in a business setting.
In this lecture I will describe and compare 2 methods commonly used to replace rare labels. Rare labels are those categories within a categorical variable that contain very few observations, and therefore may affect tree based machine learning algorithm performance.
In this lecture I will focus on variables with one predominant category.
In this lecture I will describe and compare 2 methods commonly used to replace rare labels. Rare labels are those categories within a categorical variable that contain very few observations, and therefore may affect tree based machine learning algorithm performance.
In this lecture I will focus on variables with few categories.