Skip to main content

Command Palette

Search for a command to run...

Support Vector Machines for Classification

Updated
4 min read
Support Vector Machines for Classification
I

An enthusiast about continuously broadening my horizons in the dynamic world of technology.

About

Support Vector Machines are a part of Supervise Learning Algorithms. These can be used as a part of classification or regression. These can be used to solve linear and non-linear problems. Here, in this part, we are going to see SVM Using Classification.(As for this article, I have focused on the initial stages of laying the foundation for further optimization. Hyperparameter tuning is for future work.)

How SVM functions !!

Support Vector Machines (SVMs) are machine learning algorithm that seeks to find an optimal hyperplane in an n-dimensional space to effectively separate data into distinct classes. The dimension of this hyperplane is directly influenced by the number of features in the dataset. When dealing with two features, the hyperplane takes the form of a 2D plane, while in cases with three features, it remains a plane but exists in a 3D space. However, as the number of features increases beyond three, finding a hyperplane that perfectly separates the data becomes increasingly challenging due to the complex and high-dimensional nature of the space.

Types of SVM's

  1. Linear SVM ,

  2. Kernel SVM,

  3. Support Vector Regression

Code

The steps for SVM model are as follows :

  • Importing the required libraries and reading the dataset.
import numpy as np 
import pandas as pd 
from sklearn.model_selection import train_test_split
from sklearn import svm
from sklearn.metrics import accuracy_score
  • Reading the dataset
df=pd.read_csv('/kaggle/input/breast-cancer-wisconsin-data/data.csv')
  • Performing initial check on the dataset.
df.head() # gives the first five rows oof the dataset
df.shape # gives the no. of rows and columns in the dataset
df.dropna() # drops the null values
df.dtypes # gives the datatypes of all column names
df. describe # gives a breif statistical description of dataset
  • Checking the values in the diagnosis column. This column will be used as target column.
df['diagnosis'].values
  • The values of the diagnosis column are malignant or benign. Mapping the malignant value to 0 and the benign value to 1.
df['diagnosis'] = df['diagnosis'].map({'M':0,'B': 1})
  • Checking for the duplicate and null values in the dataset.
df.duplicated().sum() # gives the total count of duplicated values
df.isnull().sum() # gives the total count of null values
  • x and y as input features and target labels, respectively, for training a machine learning model.
x = df[['radius_mean' , 'texture_mean' , 'perimeter_mean' , 'area_mean']]
y = df['diagnosis']
  • Splitting the dataset
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.4)

Classification

  • SVC stands for support vector classification.

  • Creating an empty instance and then fitting it with x_train and y_train.

clf = svm.SVC()
clf.fit(x_train, y_train)
  • Further, predicting the values with x_test.
pred=clf.predict(x_test)
  • Flattening the x_test and pred values so that they can be used to calculate accuracy.

(Here, using the accuracy_score inbuilt function in place of using different manual methods.)

  • ravel() - it is a numpy function that converts the multidimensional array into a 1D array.

  • The code first checks if y_testhas a values attribute, which indicates that it's a Pandas DataFrame or Series. If it does, it flattens it using ravel(). If not, it assumes that y_test is already a 1D array or Series and doesn't need further flattening.

 y_test_flat = y_test.values.ravel() if hasattr(y_test, 'values') else y_test
pred_flat = pred.ravel() if hasattr(pred, 'ravel') else pred
accuracy = accuracy_score(y_test_flat, pred_flat)
print(f"Accuracy: {accuracy * 100:.2f}%")

Multi-Class Classification

  • Using decision_function_shape to get the decision values for features.

  • ovo => one-vs-one fitting

clf = svm.SVC(decision_function_shape='ovo')
  • Fitting the dataset
clf.fit(x_train,y_train)
  • Creating a dec variable to store all decision function values.
dec = clf.decision_function(x_train)
print("Decision Function Values for All Features:", dec)

Linear Classification

linear_clf = svm.LinearSVC(dual="auto")
linear_clf = clf.fit(x_train,y_train)
dec = linear_clf.decision_function(x_train)
dec

Accuracy

The accuracy for the above classification was 87.72%.

Link to GitHub repository - https://github.com/ishaj72/ML-Practise-Model/blob/main/model6.ipynb

Readers can access my GitHub repository through the provided link to view the code along with its corresponding output. This allows for a more comprehensive understanding of the code's functionality and results.

Conclusion

In conclusion, the article explored the use of SVM in determining whether the cancer is malignant or benign using the breast cancer Wisconsin dataset. We started with reading the dataset and further performing a basic EDA on the dataset. Then we mapped the diagnosis column to a numerical value.

Further, we split the dataset for training and testing purpose. We delved into the core of SVMs with the creation of an SVM classifier using the scikit-learn library. The model was trained on the training data and utilized for predictions on the testing data. We calculated the accuracy of our model using the accuray_score function, providing a robust evaluation metric for its performance.

We also introduced linear classification using Multi-class and Linear SVC, showcasing SVM's adaptability to different linear classification tasks.

More from this blog

Isha Jain's blog

7 posts