Standardization vs Normalization
What’s the Difference Between Standardization and Normalization?
▶️ Normalization means adjusting values measured on different scales to a common scale, and it is required prior to training many machine learning models.
▶️ Standardization is one way of performing normalization, where we subtract the mean from each individual sample, and then divide the difference by the standard deviation. The aim of standardization is to produce variables with a mean of 0 and a variance of 1.
▶️ Standardization is also called z-score normalization. The z-score is the result of (x-mean(x))/std(x), and indicates how many standard deviations the observations differ from the mean.
▶️ Min-Max Scaling is another way of normalizing variables, where we subtract the minimum value from each observation and divide the result by the value range. This re-scales the variable values between 0 and 1.
⚠️ Min-max scaling is sometimes simply called "normalization", a cause for ambiguities. I’ve seen this in many blogs on the internet. This is not totally correct. Min-max scaling is just one way of performing normalization. But there are more.
▶️ Standardization and Min-Max scaling are the most commonly used normalization procedures in machine learning. But there are others, like, for example:
📏 Mean normalization
📏 Maximum absolute scaling
📏 Normalization using the median and interquartile-range
📏 Scaling to the vector norm
❓ Standardization or Min-Max scaling? There isn’t a clear cut answer, it depends on the application, algorithm, and variable distribution.
▶️ Most people use standardization. I’d argue that this method is particularly useful when the variables are normally distributed. When the distribution of a feature is very different from the normal distribution, then it might be helpful to use an alternative scaling method.
▶️ If the variable consists of integers, min-max scaling is a better choice. For very skewed distributions, I’d also use the min-max scaling.
▶️ If you want to keep the zero values as zeros (for example, if you are using sparse matrices), you can instead use maximum absolute scaling, supported by the MaxAbsScaler from Scikit-learn.
▶️ If the data has outliers, Scikit-learn suggests using the median and the interquartile range through the RobustScaler.
⚠️ Note: Most scaling methods DO NOT CHANGE the overall shape of the variable distribution. If you want to change the variable’s distribution shape, you need to try variance stabilizing transformations.
You can find more about scaling in these resources:
👉 Explanations and code in our most recent article: https://buff.ly/3YQXGOZ
👉 More information on when to use each method can be found in this article: https://buff.ly/33kFeBa
👉 This discussion at Stackoverflow: https://buff.ly/3R27weT
👉 Our course Feature Engineering for Machine Learning
👉 Our Python Feature Engineering Cookbook
👉 More details on variance stabilizing transformations article