Big Data Algorithms: The K-Nearest Neighbour Classification Algorithm Explained
The K-Nearest Neighbour classification algorithm (abbreviated to k-NN) is one the most basic and frequently used algorithms for classification purposes. The k-NN classifier is used in databases in which variables (the columns) indicate specific target conditions, so that a new data point can be classified as one of these target classes.
k-NN is a non-parametric algorithm, which means that the classifier does not make any assumptions on the underlying data distribution or database. More simply stated, the model structure is determined by the underlying data, which ensures that the k-NN algorithm can be used in a wide variety of situations.
The k-NN algorithm looks at the k number of nearest neighbours (whereby k is specified by the user) and classifies the new data point according to the neighbours that are closest to that data point. To calculate which data point is “nearest,” the k-NN algorithm calculates the distance from the new data point to the other data points and selects the smallest distance to other data points.