Feature Importance and Feature Selection: 2 sides of the same coin.
Understanding Their Roles in Model Performance
Feature importance and feature selection are crucial to make simple, interpretable, yet accurate machine learning models.
➡️ Feature importance indicates the degree of influence of a feature on the output of a predictive model. It quantifies the contribution of the feature to the predictive power of the algorithm.
➡️ Feature selection consists in selecting a subset of features that simplify the model without incurring significant performance degradation.
By reducing the number of features used in a machine learning model, feature selection improves computational efficiency, helps mitigate overfitting and improves the interpretability of the model.
It turns out that we need feature importance for feature selection.
Feature importance guides the feature selection process by providing insights into which features have the greatest influence on the target variable. In fact, most feature selection algorithms involve assigning a value of importance to each feature first, then ranking the features, and finally selecting the top-ranking features.
How can we derive feature importance?
▶️ We can use statistical tests like chi-square, correlation and ANOVA. Statistical tests assign importance through their p-values.
▶️ The feature variance is commonly used as a rudimentary importance metric.
▶️ Linear and logistic regression assign importance through their coefficients. The higher the coefficient magnitude, the greater the contribution of the feature to the model output.
▶️ Decision tree based models, like random forest and gradient boosting machines assign importance based on the number of times a feature is used to make a split across the various trees and the reduction in impurity.
▶️ For models that do not assign importance natively, we can infer feature importance by randomly shuffling or removing one of the variables and obtaining a measure of the performance degradation. The greater the degradation, the more influential the feature is.
▶️ Training single feature classifiers or regression models and then obtaining a performance metric like the ROC-AUC or the mean squared error, is an alternative way of inferring how important a feature is to predict a certain outcome.
How can we select features based on their importance?
After obtaining the feature importance, being a p-value, the importance derived from a model via coefficients or impurity reduction, performance degradation after shuffling or any other method, a selection algorithm ranks the features based on these metrics and then selects the top-ranking features.
There are 2 main ways to select the top-ranking features:
➡️ We can select the X top ranking features, or the features in the top X percentile, where X is an arbitrary value that we determine.
➡️ We can select features whose p-value or importance is greater than a threshold, that is again arbitrarily decided.
Wrapping up
Feature importance and feature selection are crucial components of machine learning that improve model performance and interpretability. By understanding the relevance of each feature and selecting the most important ones, we can optimize our models and achieve better results.
With the help of Python libraries like Scikit-learn, Feature-engine, and MLXtend, implementing feature selection and calculating feature importance is more accessible than ever.
To learn more about ways to assign feature importance for selection check out:
🎓 Our course: Feature Selection for Machine Learning
📘 Our book: Feature Selection in Machine Learning
📋 Feature-engine’s selection module
Ready to enhance your data science skills?
Stop aimless internet browsing. Start learning today with meticulously crafted courses offering a robust curriculum, fostering skill development with steadfast focus and efficiency.
Forecasting specialization (course)
Interpreting Machine Learning Models (course)