Feature’s Scaling in Machine Learning

Feature Scaling is a technique to standardize the independent features present in the data in a fixed range. It is performed during the data pre-processing to handle highly varying magnitudes or values or units. If feature scaling is not done, then a machine learning algorithm tends to weigh greater values, higher and consider smaller values as the lower values, regardless of the unit of the values.

Asad Mujeeb
3 min readApr 12, 2024

There are four Methods to feature a scaler :

i- MinMaxScaler

ii- Standard Scaler

iii-Robust Scaler

iv- Logarithmic Scaler

MinMax Scaler :

The formula of MinMax scaler is :

scaler = x — min(x) / max(x) — min(x)

# import the libraries
import pandas as pd
from sklearn.preprocessing import MinMaxScaler

#create a dummy data
data = {'values' : [10,20,30,40,50]}

# convert it into dataframe
df = pd.DataFrame(data)

#create a instance of MinMaxScaler
scaler = MinMaxScaler()

# create a new column and fit the data
df["scaled_value"] = scaler.fit_transform(df["values"].values.reshape(-1,1))

print(df)

ii- Standard Scaler :

x’ = Xi — mean(x) / Xmax — Xmin

# import the libraries
import pandas as pd
from sklearn.preprocessing import StandardScaler

#create a dummy data
data = {'values' : [10,20,30,40,50]}

# convert it into dataframe
df = pd.DataFrame(data)

#create a instance of StandardScaler
scaler = StandardScaler()

# create a new column and fit the data
df["scaled_value"] = scaler.fit_transform(df["values"].values.reshape(-1,1))

print(df)

iii- Robust Scaler

In this method , we use two statistical measure of data :

i- Median

ii- Inter Quartile Range

Formula :

x’ = x — median(x) / IQR

# import the libraries
import pandas as pd
from sklearn.preprocessing import RobustScaler

#create a dummy data
data = {'values' : [10,20,30,40,50]}

# convert it into dataframe
df = pd.DataFrame(data)

#create a instance of RobustScaler
scaler = RobustScaler()

# create a new column and fit the data
df["scaled_value"] = scaler.fit_transform(df["values"].values.reshape(-1,1))

print(df)

iv- Logarithmic Scaler

# import the libraries
import pandas as pd
import numpy as np

#create a dummy data
data = {'values' : [10,20,30,40,50]}

# convert it into dataframe
df = pd.DataFrame(data)

# create a new column and fit the data
df["log"] = np.log(df["values"])
df["log2"] = np.log2(df["values"])
df["log10"] = np.log10(df["values"])

print(df)

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Asad Mujeeb
Asad Mujeeb

Written by Asad Mujeeb

0 Followers

Proficient in machine learning algorithms, statistical analysis, and data visualization techniques

No responses yet

Write a response