Titanic Survivors prediction using Logistic Regression with Gradient Descent
Good day!
Please, checkout the program for predicting titanic survivors which is written in Octave language. Accuracy rate of the model: 68%
Find the dataset in below the following link: Titanic survivors dataset. Please, remove the features with “zero” in the dataset.
pkg load io;
Program: Predicting titianic survivours using logistic regression with gradient descent
Accuracy rate: 68%
Author: Shankar Muthusami
Created on: 24/Mar/2019Training set range: 1 to 1000
Test set range: 1000 to 1310
#}[An, Tn, Ra, limits] = xlsread(“~/ML/ML Practice/dataset/train_and_test2.csv”, “Sheet2”, “A2:H1000”);
# As per CSV file we are reading columns from 1 to 7. 8-th column is Survived, which is what we are going to predict
X = [An(:, [1:7])];Y = [An(:, 8)];
X = horzcat(ones(size(X,1), 1), X);
# Initializing theta values as zero for all
theta = zeros(size(X,2),1);#Initializing learning rate to 0.003
learningRate = 0.003;# Step 1: Calculate Hypothesis
function g_z = estimateHypothesis(X, theta)
z = X * theta;
e_z = exp(-z);
denominator = 1.+e_z;
g_z = 1./denominator;
endfunction# Step 2: Calculate Cost function
function cost = estimateCostFunction(hypothesis, Y)
log_1 = log(hypothesis);
log_2 = log(1.-hypothesis);
y1 = Y;
term_1 = y1.*log_1;
y2 = 1.-Y;
term_2 = y2.*log_2;
cost = term_1 + term_2;
cost = sum(cost);
# no.of.rows
m = size(Y, 1);
cost = -1 * (cost/m);
endfunction# Step 3: Updating theta values
function updatedTheta = updateThetaValues(_X, _Y, _theta, _hypothesis, learningRate)
s1 = _hypothesis — _Y;
s2 = s1 .* _X;
s3 = sum(s2);
# no.of.rows
m = size(_Y, 1);
s4 = (learningRate * s3)/m;
updatedTheta = _theta .- s4';
endfunctioncostVector = [];
iterationVector = [];
for i = 1:200000
# Step 1
hypothesis = estimateHypothesis(X, theta);
# Step 2
cost = estimateCostFunction(hypothesis, Y);
costVector = vertcat(costVector, cost);
# Step 3 — Updating theta values
theta = updateThetaValues(X, Y, theta, hypothesis, learningRate);
iterationVector = vertcat(iterationVector, i);
endforfunction plotGraph(iterationVector, costVector)
plot(iterationVector, costVector);
ylabel(‘Cost Function’);
endfunctionplotGraph(iterationVector, costVector);
#We have estimated the parameters. Now, let’s check our test data i.e., from the range 1001 to 1310 to predict whether the passengers are survived or not.
function evaluateTestdata(theta)
[Test, Tn, Ra, limits] = xlsread(“~/ML/ML Practice/dataset/train_and_test2.csv”, “Sheet2”, “A1001:H1310”);# As per CSV file we are reading columns from 1 to 7. 8-th column is Survived, which is what we are going to predict
_X = [Test(:, [1:7])];_X = horzcat(ones(size(_X,1), 1), _X);
_Y = estimateHypothesis(_X, theta);
# Calculating accuracy percentage of our model
cnt = 0;
for i = 1:size(_Y,1)
val = _Y(i,1);
cnt = cnt + 1;
disp(“Count”), disp(cnt);
percentage = (cnt * 100)/ size(_X,1);
disp(“Accuracy Percentage of the Model: “),disp(percentage);
Gradient descent graph: