KNN Classifier

K Nearest Neighbor(KNN) is a very simple, easy to understand, versatile and one of the topmost machine learning algorithms using in Supervised Machine Learning.

KNN can use for both regression and classification problems.

KNN used in a variety of applications such as image recognition, finance, healthcare, handwriting detection political science and video recognitions. In credit ratings, financial institutes will predict the credit rating of customers.

In loan disbursement, banking institutes will predict whether the loan is safe or risky. In political science, classifying potential voters in two classes will vote or won’t vote.

KNN algorithm used for both classification and regression problems in machine learning. KNN algorithm based on the feature similarity approach.

What happens inside the KNN ?

In K Nearest Neighbor, if we forgetting K, then it’s become a nearest neighbor.

As that, predictions by KNN classifier is depending on the nearest neighbors.

As an example, suppose we have a dataset as given below.

If we train an ML model using this data set in KNN algorithm, it applying data as below.

KNN classifiers working like that. It gives the predicted label for given features by searching the Nearest Neighbors. Suppose we trained more data and modified as below. Assume we need the predicted label for weight 138 and colour is red(x). We can see it in the chart below.

As the chart KNN finds the nearest neighbor to the point we need a predicted label.

Now we know about the nearest neighbors. But what is K?

In KNN , K is the number of nearest neighbors. The number of neighbors is the core deciding factor.K is generally an odd number if the number of classes is 2.When K=1, then the algorithm is known as the nearest neighbor algorithm. This is the simplest case. Suppose P1 is the point, for which label need to predict. First, we need to find the one closest point to P1 and then label the nearest point assigned to P1.

Now we know what exactly KNN. We can decide the nearest neighbor by seen the chart. But how this algorithm doing it?

Suppose P1 is the point, for which label needs to predict. First, we need to find the K closest point to P1 and then classify points by majority votes for their class and the class with the most votes is taken as the prediction.

Finding the closest similar points, we find the distance between points using distance measures such as Euclidean distance, Manhattan distance, and Minkowski distance.

Euclidean Distance

Euclidean Distance for Three Dimensions

Euclidean Distance for n Dimensions

How to Decide K Value

Research has shown that no optimal number of neighbors suits all kind of data sets. Each dataset has it’s own requirements.

In the case of a small number of neighbors, the noise will have a higher influence on the result, and a large number of neighbors make it computationally expensive.

Research has also shown that a small amount of neighbors are most flexible fit which will have low bias but the high variance and a large number of neighbors will have a smoother decision boundary which means lower variance but higher bias.

Generally, Data scientist chooses as an odd number if the number of classes is even.

We can also check by generating the model on different values of k and check their performance.

In the next session, we’ll talk about the implementation of KNN Algorithm.

Machine Learning