You’ve probably heard of the term “feature importance” in the context of decision trees or random forests: It’s a value that we obtain from these models that tell us how much a feature influences the output of the model.
➡️ Features with high importance are crucial for the model’s accuracy.
That’s why, to make simpler models, we select features whose importance is the highest.
It turns out that there are many ways to determine feature importance, or in other words, to rank the influence of a feature on the model’s output.
▶️ We can assign importance through the F or chi-square statistic, or a correlation coefficient, and their associated p-values. The lower the p-value or the greater the statistic, the more important the feature is.
▶️ Linear and logistic regression assign importance through their coefficients. The higher the coefficient magnitude, the greater the contribution of the feature to the model output.
▶️ Decision tree-based models, like random forest and gradient boosting machines assign importance based on the number of times a feature is used to make a split across the various trees and the reduction in impurity.
▶️ We can infer feature importance by randomly shuffling the feature values and determining the model performance degradation. The greater the model’s decrease in performance, the more important the feature is.
▶️️ Similarly, we can remove a feature and retrain the model and determine the importance as the drop in performance.
How can we select features based on their importance?
After obtaining the feature importance, we can:
➡️ Select the X top ranking features, or the features in the top X percentile, where X is an arbitrary value.
➡️ Select features whose importance is greater than a threshold.
➡️ If we introduced random variables (probes), we’d select those features whose importance is greater than that of the probes.
So in a nutshell, when selecting features, we need to find a way to analyze the influence of the feature on the model’s output, and with that, we rank the features and then select those in the top ranks.
And as always, the various methods in which we determine feature importance have advantages and shortcomings, that I’ll leave for a future post.
To learn more about ways to assign feature importance for selection check out:
Ready to enhance your skills?
Our specializations, courses and books are here to assist you:
Advanced Machine Learning (specialization)
Forecasting with Machine Learning (course)