Classification is a supervised machine learning technique used for predicting the class of the given data points based on available features.
For example, as we had studied in the introduction lesson, we can predict if a student will pass or fail an exam based on his/her number of class attendances, home assignment marks and number of completed projects.
Student Number | Number of class attendances | Home assignment marks | Number of completed projects | Examination Result (Pass/Fail) |
1 | 78 | 9 | 3 | Pass |
2 | 56 | 6 | 1 | Fail |
3 | 88 | 8 | 3 | Pass |
4 | 72 | 7 | 3 | Pass |
5 | 86 | 9 | 5 | Pass |
6 | 60 | 5 | 1 | ? |
This type of classification is called binary classification since we are trying to predict a binary set of classes (True/False). If we were trying to classify the data points into three or more classes (True/False/Absent), the correct term to use would be multi-class classification.
Classification techniques
Just like regression, there are several classification techniques in Machine Learning that are applied on the basis of the type of available variables and their data distribution.
We will be discussing some of the major classification algorithms such as Logistic Regression, K-Nearest Neighbours, Naive Bayes, Decision trees, Random Forest, Support Vector Machines, and Stochastic Gradient Descent classifiers.
Each type of classification technique has its own significance and best-suited conditions for them to be applied. A good data scientist typically applies multiple classification algorithms to the same problem and picks out the best one through model evaluation.
We will be discussing these techniques and their implementation in Python one-by-one in the upcoming lessons.