Project Description

Big Data Algorithms: The K-Nearest Neighbour Classification Algorithm Explained

By Jan-Willem Middelburg

The K-Nearest Neighbour classification algorithm (abbreviated to k-NN) is one the most basic and frequently used algorithms for classification purposes. The k-NN classifier is used in databases in which variables (the columns) indicate specific target conditions, so that a new data point can be classified as one of these target classes.

k-NN is a non-parametric algorithm, which means that the classifier does not make any assumptions on the underlying data distribution or database. More simply stated, the model structure is determined by the underlying data, which ensures that the k-NN algorithm can be used in a wide variety of situations.

The k-NN algorithm looks at the k number of nearest neighbours (whereby k is specified by the user) and classifies the new data point according to the neighbours that are closest to that data point. To calculate which data point is “nearest,” the k-NN algorithm calculates the distance from the new data point to the other data points and selects the smallest distance to other data points.

About the Speaker

Closing and Award CeremonyJan-Willem Middelburg

The author of the Enterprise Big Data Framework publications. Jan-Willem has over a decade of experience in Big Data, Machine Learning and AI algorithms design and applications.

A pioneer and advocate for professionalization in Automation and Big Data, he is a frequent keynote speaker and moderator at universities and technology conferences around the world. Jan-Willem holds a Bachelor’s degree in Industrial Engineering, a Master’s in Supply Chain Management from the Rotterdam School of Management, and is currently pursuing a second Master’s degree in Computer and Information Technology at the University of Pennsylvania.