Lessons about photography

The range mask is a setting available in all filters, radial, graduated and brush. It’s a parameter that will help you to be even more precise in the zone you don’t want the filter to apply. The…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Feature Selection In The Breast Cancer Dataset

In the following cells, we will select a group of variables, the most predictive ones, to build our machine learning models. Feature Selection is the process where you automatically or manually select those features which contribute most to your prediction variable or output in which you are interested in. Having irrelevant features in your data can decrease the accuracy of the models and make your model learn based on irrelevant features. The selection of relevant features may also get benefited from the right domain knowledge.

In this part we will select feature with different methods that are

We will use random forest classification in order to train our model and predict.

png
png
png

After drop correlated features, as it can be seen in below correlation matrix, there are no more correlated features. Actually, I know and you see there is correlation value 0.9 but lets see together what happen if we do not drop it.

Well, we choose our features but did we choose correctly ? Lets use random forest and find accuracy according to chosen features.

Accuracy is almost 95% and as it can be seen in confusion matrix, we make few wrong prediction. Now lets see other feature selection methods to find better results.

In univariate feature selection, we will use SelectKBest that removes all but the k highest scoring features.

In this method we need to choose how many features we will use. For example, will k (number of features) be 5 or 10 or 15? The answer is only trying or intuitively. I do not try all combinations but I only choose k = 10 and find best 10 features.

Basically, it uses one of the classification methods (random forest in our example), assign weights to each of features. Whose absolute weights are the smallest are pruned from the current set features. That procedure is recursively repeated on the pruned set until the desired number of features

Like previous method, we will use 10 features. However, which 10 features will we use ? We will choose them with RFE method.

Now we will not only find best features but we also find how many features do we need for best accuracy.

png

Lets look at what we did up to this point. Lets accept that this data is very easy to classification. However, our first purpose is actually not finding good accuracy. Our purpose is learning how to make feature selection and understanding data.

As you can seen in plot above, after 6 best features importance of features decrease. Therefore we can focus these 6 features.

png
png
png

We will use principle component analysis (PCA) for feature extraction. Before PCA, we need to normalize data for better performance of PCA.

png

According to variance ration, 3 component can be chosen.

Add a comment

Related posts:

Diablo 4 Global Release Time

Diablo 4 gains valuable feedback after attracting thousands of players in its open beta phase. Despite some connection issues during peak hours, the beta experience was great, and the game looked…