Feature Encoding in Machine Learning
Machine learning models can only work with numerical values. For this reason, it is necessary to transform the categorical values of the relevant features into numerical ones. This process is called feature encoding. Data frame analytics automatically performs feature encoding.There are three ways to do so :
i- One Hot Encoding
import pandas as pd
data = {“colors” : [“red”, “green”, “blue”, “red”]}
df = pd.DataFrame(data)
encoded_data = pd.get_dummies(df, columns = [“colors” ])
print(encoded_data)
ii- Label Encoder
import pandas as pd
from sklearn.preprocessing import LabelEncoder
data = {“Animals” : [“dog”, “cow”, “lion”, “crow”, “sparrow”]}
df = pd.DataFrame(data)
label_encoder = LabelEncoder()
df[“Animal encoded”] = label_encoder.fit_transform(df[“Animals”])
print(df)
iii-Ordinal Encoding
from sklearn.preprocessing import OrdinalEncoder
# Sample data
data = {‘Size’: [‘Small’, ‘Medium’, ‘Large’, ‘Medium’]}
df = pd.DataFrame(data)
# Ordinal Encoding
ordinal_encoder = OrdinalEncoder(categories=[[‘Small’, ‘Medium’, ‘Large’]])
df[‘Size_encoded’] = ordinal_encoder.fit_transform(df[[‘Size’]])
print(df)