Yuma 4×4

Media and Communications

How kNN algorithm works

How kNN algorithm works

My name is Thales Sehn Körting and I will present very breafly how the kNN algorithm
works kNN means k nearest neighbors
It’s a very simple algorithm, and given N training vectors, suppose we have all these
‘a’ and ‘o’ letters as training vectors in this bidimensional feature space, the kNN
algorithm identifies the k nearest neighbors of ‘c’
‘c’ is another feature vector that we want to estimate its class
In this case it identifies the nearest neighbors regardless of labels
So, suppose this example we have k equal to 3, and we have the classes ‘a’ and ‘o’
And the aim of the algorithm is to find the class for ‘c’
If k is 3 we have to find the 3 nearest neighbors of ‘c’
So, we can see that in this case the 3 nearest neighbors of ‘c’ are these 3 elements here
We have 1 nearest neighbor of class ‘a’, we have 2 elements of the class ‘o’ which are
near to ‘c’ We have 2 votes for ‘o’ and 1 vote for ‘a’
In this case, the class of the element ‘c’ is going to be ‘o’
This is very simple how the algorithm k nearest neighbors works
Now, this is a special case of the kNN algorithm, is that when k is equal to 1
So, we must try to find the nearest neighbor of the element that will define the class
And to represent this feature space, each training vector will define a region in this
feature space here And a property that we have is that each region
is defined by this equation We have a distance between each element x
and x_i, that have to be smaller than the same distance for each other element
In this case it will define a Voronoi partition of the space, and can be defined, for example,
this element ‘c’ and these elements ‘b’, ‘e’ and ‘a’ will define these regions, very specific
regions This is a property of the kNN algorithm when
k is equal to 1 We define regions 1, 2, 3 and 4, based on
the nearest neighbor rule Each element that is inside this area will
be classified as ‘a’, as well as each element inside this area will be classified as ‘c’
And the same for the region 2 and region 3, for classes ‘e’ and ‘b’ as well
Now I have just some remarks about the kNN We have to chose and odd value of k if you
have a 2-class problem This happens because when we have a 2-class
and if we set k equal to 2, for example, we can have a tie
What will be the class? The majority class inside the nearest neighbors?
So, we have always to set odd values for a 2-class problem
And also the value of k must not be a multiple of the number of classes, it is also to avoid
ties And we have to remember that the main drawback
of this algorithm is the complexity in searching the nearest neighbors for each sample
The complexity is a problem because we have lots of elements, in the case of a big dataset
we will have lots of elements And we will have to search the distance between
each element to the element that we want to classify
So, for a large dataset, this can be a problem Anyhow, this kNN algorithm produces good results
So, this is the reference I have used to prepare this presentation
Thanks for your attention, and this is very breafly how the kNN algorithm works

100 thoughts on “How kNN algorithm works

  1. Awesome explanation. Quick, simple, to the point. Exactly what I was needing to do my homework 🙂 Thanks!

  2. Thank you for making this video. It is informative and to the point, as well as being shorter than MIT OpenCourseWare's 49 minute lecture. Subbed and definitely will be watching your other videos to learn more machine learning topics!

  3. Hi Thales,
    Thank you for your tutorial. It was very good instruction to KNN. It may be not a good place to ask technical questions. But do you mind give a direction of the problem I am having now   I am suing KNN to distinguish the numbers. 0 – 9. And the question would be 1. How much training set will be needed normally for each number? 2. I am using the OpenCV to train the number image in the same font. Then for each training item in training set, is it ok to have same font but different szie of image to train ? how each item should be slightly different to draw the circle of one number ?
    Many thanks.

  4. Hey. The video is quite informative.
    However, I was wondering how each training vector defines a region in space when k=1.

  5. Congratulations for the presentation, by your accent you are Brazilian are't you? I will keep following your videos, really enjoying the material. Just as a feedback, you mentioned distances so it would be interesting to explain how to calculate the distance and the type of distance measurements can be used :). Cheers from Florianópolis.

  6. Voronoi partition is clearly explain and visually straightforward. Thanks for sharing such a good presentation

  7. please mr.Thales Sehn Körting l not understand how to k-nearest neighbor smoother work because knn different to kernel estimation by bandwidth .how to knn work? in nonparametric regression

  8. What is this c is all about ???? Can you give some example please, is it something like the c is …X men movie .. we try to identify in which class or category this movie falls into such as romantic or Action movie ??? The romantic and action movies are already classified by the KNN model.

  9. 1:13 – k=# (# = how many neighbours to check, before classifying)
    1:29 – visually depicted how it works
    3:48 – Remarks (no examples – no clear to understand)

  10. I studied kNN in school (multiple years ago), and worked on a team project (in school) where we were trying to predict home prices based on several numerical predictors. The problem is I don't recall how we did it. I have saved the write-up which says we used: the original listing price, # of bathrooms, is the house in zipcode A B or C, and was it built after 1980. The write-up also says we had a "K of 3" and a mean prediction error of $16,460. Can you connect the dots for me here? How do you predict home prices using kNN? would every price point be it own classification. Regression made more sense to me for this task.

  11. In your example, when you find out that 'C' element belongs to 'O' class, if you would like to clasify another element 'D' , is 'C' taken in account to predict the label for 'D' or just the original samples are used to clasify it?


  12. So what about when k=1 and we have 2 classes A and B. We want to to identify a new element x in one of this classes and the distances are equal. Then the algorithm in which class puts the element x?

  13. Thank you very much for your video.It is very help full to me.But i have a some confusion,that is when k is multiples of classes only the problem is finding the element belonging class.Is that right?

  14. Aren't different metrics, like Manhattan distance, EMD, or euclid helping with the drawback of kNN algorithm? At least that's what I used in my bachelor's thesis in Facial Pattern Recognition via LBP + kNN classification (+k means)

  15. here a nice related post i found on the same topic http://www.mien.in/2017/09/25/implementing-knn-from-scratch-in-python/

  16. thanks for the great work..could u make a video on selecting apt K Values for differemt data sets and how the complexity of calc nearest neighbours can be minimised?I'll be highly obliged.

  17. Thank you for making this video. Could you please make another video to explain how to use KNN in solving regression problem?

  18. Great work. Keep it up. I have been trying to learn and write about kNN. Would be great if you can check it out and comment. In the post, I built a model and iterated it until i found an optimal test accuracy for prediction: https://the-ml-blogs.blogspot.com/2018/12/4-deeper-into-knn.html

  19. As simple as that? I was studying KNN from Introduction To Statistical Learning and I felt it so much tough to understand.

    I am trying to learn Data Science, I have completed like 4 courses of Datacamp's path to Data Scientist With Python. I am feeling like the path is less concept oriented and more based on how to use python for data science.

    I am totally confused with how to learn Data Science, I have 20 semester end holidays left. I want to use the time. I have tried the book Introduction to statistical learning, the book applies the methods using R and I am already learning python(using Datacamp).

    I have tried few statistics courses which were complete theory. I have tried Andrew NG and everything went above my head.

    I am in search for a course that includes both, the practical as well as the theoretical part.

  20. Nice video, thanks! But I do have a question: in k=1 case why do we have to do Voronoi partition? Instead can't we choose the class of the new sample by the shortest distance to representative points of the training dataset classes? I guess it is the same at the end of the day but isn't it easier to calculate a sample's shortest distances than calculating for the whole Euclidean Space?

  21. You should have explained how KNN is trained first. How the centroid approach works. Without training explained, you are testing the data point.

  22. Great video but I have one unanswered question:

    Given we have three classes A B C and k = 5. Let's assume the nearest neighbours are A A B B C, how is determined whether A or B should be returned as a result?

  23. I have understood more about k-NN and 1-NN in these few minutes than the hours of reading papers or thesis or wiki or endless webpages of explaining nothing at all! When even a dummy like me understands it, it is explained very well! Most of the reading on ML does nothing except providing you with more abbreviations, acronyms or terms, in which are already a separate paper itself, in which you will find even more abbreviations, acronyms or terms. Thank you for explaining in English!

Leave comment

Your email address will not be published. Required fields are marked with *.