The Purpose.

Inspired by Lean startup process with the build-measure-learn methodology, I designed an enhanced version of it called Triple Loop Product lifecycle. After getting a lot of positive feedback, I would…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Data Preprocessing using Scikit Learn

What is data preprocessing?

Data preprocessing is a process of preparing the raw data and making it suitable for a machine learning model. It is the first and crucial step while creating a machine learning model. And while doing any operation with data, it is mandatory to clean it and put in a formatted way.

There are a lot of preprocessing methods but we will mainly focus on the following methodologies:

(1) Encoding the Data

(2) Normalization

(3) Standardization

(4) Imputing the Missing Values

(5) Discretization

Data Set is all about Credit Card Approval Prediction. Credit score cards are a common risk control method in the financial industry. It uses personal information and data submitted by credit card applicants to predict the probability of future defaults and credit card borrowings. The bank is able to decide whether to issue a credit card to the applicant. Credit scores can objectively quantify the magnitude of risk. There is 19 Coloum is there.

Encoding :

Encoding is a required pre-processing step when working with categorical data for machine learning algorithms. There are two types of encoders we will discuss here.

Here Female and Male Labels will be Convert into 0 and 1.

process of LabelEncoder

2. OneHotEncoder : One hot encoder does the same things but in a different way. Label Encoder initializes the particular number but one hot encoder will assign a whole new column to particular categories.

OneHotEncoder

Normalization : The data normalization is a basic element of data mining. It means transforming the data, namely converting the source data in to another format that allows processing data effectively. The main purpose of data normalization is to minimize or even exclude duplicated data.

After Normalization

Standardization : Standardization is another scaling technique where the values are centered around the mean with a unit standard deviation. This means that the mean of the attribute becomes zero and the resultant distribution has a unit standard deviation.

scaling data using standardization

Imputing the missing value : In statistics, imputation is the process of replacing missing data with substituted values. That is to say, when one or more values are missing for a case, most statistical packages default to discarding any case that has a missing value, which may introduce bias or affect the representativeness of the results.

Remove the Year Employed Column

Discritization : Discretization is the process of putting values into buckets so that there are a limited number of possible states. The buckets themselves are treated as ordered and discrete values. You can discretize both numeric and string columns. There are several methods that you can use to discretize data.

Discretization using Uniform
Discretization using Kmeans

Thank You for referring my blog!!

Add a comment

Related posts:

How ENVOY and Our Community Came to Fruition

NFTs have been completely exploding across the board in 2021, marking a historic year for the rise of digital blockchain creations everywhere. Countless NFT projects have been scrambling to…

About Node.js

I welcome you all to the article about Node.js where I would like to introduce this platform and show some example. The article is written primarily because of the lack of resources and articles in…

Digital transformation of the Luxury Automobile Industry

The global luxury automotive industry is one of the most valuable industries. It is emerging in countries like China, the United States, India, and some parts of Europe. This market is valued at…