Feature’s Scaling in Machine Learning
Feature Scaling is a technique to standardize the independent features present in the data in a fixed range. It is performed during the data pre-processing to handle highly varying magnitudes or values or units. If feature scaling is not done, then a machine learning algorithm tends to weigh greater values, higher and consider smaller values as the lower values, regardless of the unit of the values.
There are four Methods to feature a scaler :
i- MinMaxScaler
ii- Standard Scaler
iii-Robust Scaler
iv- Logarithmic Scaler
MinMax Scaler :
The formula of MinMax scaler is :
scaler = x — min(x) / max(x) — min(x)

# import the libraries
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
#create a dummy data
data = {'values' : [10,20,30,40,50]}
# convert it into dataframe
df = pd.DataFrame(data)
#create a instance of MinMaxScaler
scaler = MinMaxScaler()
# create a new column and fit the data
df["scaled_value"] = scaler.fit_transform(df["values"].values.reshape(-1,1))
print(df)
ii- Standard Scaler :
x’ = Xi — mean(x) / Xmax — Xmin
# import the libraries
import pandas as pd
from sklearn.preprocessing import StandardScaler
#create a dummy data
data = {'values' : [10,20,30,40,50]}
# convert it into dataframe
df = pd.DataFrame(data)
#create a instance of StandardScaler
scaler = StandardScaler()
# create a new column and fit the data
df["scaled_value"] = scaler.fit_transform(df["values"].values.reshape(-1,1))
print(df)

iii- Robust Scaler
In this method , we use two statistical measure of data :
i- Median
ii- Inter Quartile Range
Formula :
x’ = x — median(x) / IQR
# import the libraries
import pandas as pd
from sklearn.preprocessing import RobustScaler
#create a dummy data
data = {'values' : [10,20,30,40,50]}
# convert it into dataframe
df = pd.DataFrame(data)
#create a instance of RobustScaler
scaler = RobustScaler()
# create a new column and fit the data
df["scaled_value"] = scaler.fit_transform(df["values"].values.reshape(-1,1))
print(df)

iv- Logarithmic Scaler
# import the libraries
import pandas as pd
import numpy as np
#create a dummy data
data = {'values' : [10,20,30,40,50]}
# convert it into dataframe
df = pd.DataFrame(data)
# create a new column and fit the data
df["log"] = np.log(df["values"])
df["log2"] = np.log2(df["values"])
df["log10"] = np.log10(df["values"])
print(df)
