Data Pre-processing using Scikit-learn

  1. Standardization
  2. normalization
  3. encoding
  4. discretization
  5. imputation of missing values.
from sklearn.preprocessing import StandardScalernumeric_columns = [c for c in data.columns if data[c].dtype != np.dtype('O')]
temp_data = data[numeric_columns]
standard_scaler = StandardScaler()
standardized_data = standard_scaler.fit_transform(temp_data)
pd.DataFrame(standardized_data , columns = temp_data.columns)
from sklearn.preprocessing import MinMaxScalernormalizer = MinMaxScaler()
normalized_data = normalizer.fit_transform(temp_data)
pd.DataFrame(normalized_data , columns = temp_data.columns)
  • Label Encoding
  • One hot Encoding
  • Dummy Encoding
  • Effect Encoding
  • Binary Encoding
  • BaseN Encoding
  • Hash Encoding
  • Target Encoding
from sklearn.preprocessing import LabelEncoderle = LabelEncoder()
data['Status'] = le.fit_transform(data['Status'])
data['Status'].value_counts()
one_hot = OneHotEncoder()
transformed_data = one_hot.fit_transform(data['race/ethnicity'].values.reshape(-1,1)).toarray()
one_hot.categories_transformed_data = pd.DataFrame(transformed_data ,columns = ['math score', 'reading score'])
transformed_data.head()transformed_data.iloc[90 , ]data['race/ethnicity'][90]

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store