It is a Feature Selection and Dimensionality reduction. There are a wide range how to do it, but in this video I demonstrate one of the quickest way that is suitable for both beginners and data scientist, machine learning experts.
It is a data inspection, feature selection from pairplot (made by seaborn) and from heatmap (seaborn and matplotlib).
For implement this solution you must have installed following Python modules:
- Numpy (
- Pandas (
- Matplotlib (
- Seaborn (
The content of the video:
0:09 - Introduction and some theory.
1:59 - CODING PART BEGIN. Preparing Python modules.
2:14 - Reading Dataset with Pandas.
Step #1.
2:41 - Inspecting imported dataframe (features).
Step #1.1
2:49 - Selecting Numerical and Dummy (if exists) variables from dataset.
Step #1.2
3:21 - Generate a pairplot with Seaborn.
Step #2 and Step #2.1
3:42 - Variable selection from Covariance Matrix. Scaling features from raw dataset.
Sep 2.2
4:05 - Generate Covariance Matrix with Matplotlib and Seaborn.
5:08 - Selecting cmap (colormap) value for heatmap from Seaborn official documentation.
6:04 - Result. Covariance Matrix showing Correlation coefficients between selected features.
Step # 3.
6:16 - Construct Pandas DataFrame from selected the most important features.
6.45 - The result. Constructed Pandas DataFrame from the most important features.
--------
Selecting Seaborn and Matplotlib colormap:
This video is created to demonstrate an idea how to implement feature engineering for feature selection and dimensionality reduction with very simple dataset.
In real world, please take a strong attention to data pre-processing and data cleaning!
Hoping this useful for data scientist, data analysts and everyone who working with data.
Wishes! - Vytautas.
0 Comments