Build Logistic Regression Algorithm From Scratch and Apply It on Data set
by rishit.dagli in Circuits > Software
570 Views, 2 Favorites, 0 Comments
Build Logistic Regression Algorithm From Scratch and Apply It on Data set
![Large14.jpg](/proxy/?url=https://content.instructables.com/FEB/A27G/JYWVOS27/FEBA27GJYWVOS27.jpg&filename=Large14.jpg)
Make predictions for breast cancer, malignant or benign using the Breast Cancer data set
Data set - Breast Cancer Wisconsin (Original) Data Set
This code and tutorial demonstrates logistic regression on the data set and also uses gradient descent to lower the BCE (binary cross entropy).
Pre Requisites
![breast cancer represent.jpg](/proxy/?url=https://content.instructables.com/FO8/5AME/JYWVOPTV/FO85AMEJYWVOPTV.jpg&filename=breast cancer represent.jpg)
- Knowledge of Python
- Familiarity with linear regression and gradient descent
- Installed libraries
- numpy
- pandas
- seaborn
- random
I have also included the code GitHub Link at the end
About the Data Set
![breast cancer description.PNG](/proxy/?url=https://content.instructables.com/FAX/J1WN/JYWVOPTQ/FAXJ1WNJYWVOPTQ.png&filename=breast cancer description.PNG)
- Sample code number: id number
- Clump Thickness: 1 - 10
- Uniformity of Cell Size: 1 - 10
- Uniformity of Cell Shape: 1 - 10
- Marginal Adhesion: 1 - 10
- Single Epithelial Cell Size: 1 - 10
- Bare Nuclei: 1 - 10
- Bland Chromatin: 1 - 10
- Normal Nucleoli: 1 - 10
- Mitoses: 1 - 10
- Class: (2 for benign, 4 for malignant)
Logistic Regression Algorithm
![logistic_regression.gif](/proxy/?url=https://content.instructables.com/FBR/MTYR/JYWVORHJ/FBRMTYRJYWVORHJ.gif&filename=logistic_regression.gif)
- Use the sigmoid activation function -
- Remember the gradient descent formula for liner regression where Mean squared error was used but we cannot use Mean squared error here so replace with some error
- Gradient Descent - Logistic regression -
- Conditions for E:
- Convex or as convex as possible
- Should be function of
- Should be differentiable
- So use, Entropy =
- As we cant use both and y so use cross entropy as
- So add 2 cross entropies CE 1 = and CE 2 = . We get Binary Cross entropy (BCE) =
- So now our formula becomes,
- Using simple chain rule we obtain,
- Now apply Gradient Descent with this formula
Code
![breast cancer.PNG](/proxy/?url=https://content.instructables.com/FUX/0L03/JYWVOPUL/FUX0L03JYWVOPUL.png&filename=breast cancer.PNG)
Data preprocessing
Load data, remove empty values. As we are using logistic regression replace 2 and 4 with 0 and 1.sns.pairplot(df)
Create pair wise graphs for the features.Do Principal component analysis for simplified learning.
full_data=np.matrix(full_data)
x0=np.ones((full_data.shape[0],1))
data=np.concatenate((x0,full_data),axis=1)
print(data.shape)
theta=np.zeros((1,data.shape[1]-1))
print(theta.shape)
print(theta)
Convert data to matrix, concatenate a unit matrix with the complete data matrix. Also make a zero matrix, for the initial theta.test_size=0.2
X_train=data[:-int(test_size*len(full_data)),:-1]
Y_train=data[:-int(test_size*len(full_data)),-1]
X_test=data[-int(test_size*len(full_data)):,:-1]
Y_test=data[-int(test_size*len(full_data)):,-1]
Create the train-test splitdef sigmoid(Z):
return 1/(1+np.exp(-Z))
def BCE(X,y,theta):
pred=sigmoid(np.dot(X,theta.T))
mcost=-np.array(y)*np.array(np.log(pred))np.array((1y))*np.array(np.log(1pred))
return mcost.mean()
Define the code for sigmoid function as mentioned and the BCE.def grad_descent(X,y,theta,alpha):
h=sigmoid(X.dot(theta.T))
loss=h-y
dj=(loss.T).dot(X)
theta -= (alpha/(len(X))*dj)
return theta
cost=BCE(X_train,Y_train,theta)
print("cost before: ",cost)
theta=grad_descent(X_train,Y_train,theta,alpha)
cost=BCE(X_train,Y_train,theta)
print("cost after: ",cost)
Define gradient descent algorithm and also define the number of epochs. Also test the gradient descent by 1 iteration.def logistic_reg(epoch,X,y,theta,alpha):
for ep in range(epoch):
#update theta
theta=grad_descent(X,y,theta,alpha)
#calculate new loss
if ((ep+1)%1000 == 0):
loss=BCE(X,y,theta)
print("Cost function ",loss)
return theta
theta=logistic_reg(epoch,X_train,Y_train,theta,alpha)
Define the logistic regression with gradient descent code.print(BCE(X_train,Y_train,theta))
print(BCE(X_test,Y_test,theta))
Finally test the code,
Now we are done with the code.
Additional Reading
![breast.png](/proxy/?url=https://content.instructables.com/F7W/4BKC/JYWVOPTR/F7W4BKCJYWVOPTR.png&filename=breast.png)