Titanic Survivors prediction using Logistic Regression with Gradient Descent

Shankar Muthu samy
3 min readMar 24, 2019

--

Hi,

Good day!

Please, checkout the program for predicting titanic survivors which is written in Octave language. Accuracy rate of the model: 68%

Find the dataset in below the following link: Titanic survivors dataset. Please, remove the features with “zero” in the dataset.

pkg load io;

#{
Program: Predicting titianic survivours using logistic regression with gradient descent
Accuracy rate: 68%
Author: Shankar Muthusami
Created on: 24/Mar/2019

Training set range: 1 to 1000

Test set range: 1000 to 1310
#}

[An, Tn, Ra, limits] = xlsread(“~/ML/ML Practice/dataset/train_and_test2.csv”, “Sheet2”, “A2:H1000”);

# As per CSV file we are reading columns from 1 to 7. 8-th column is Survived, which is what we are going to predict
X = [An(:, [1:7])];

Y = [An(:, 8)];

X = horzcat(ones(size(X,1), 1), X);

# Initializing theta values as zero for all
theta = zeros(size(X,2),1);

#Initializing learning rate to 0.003
learningRate = 0.003;

# Step 1: Calculate Hypothesis
function g_z = estimateHypothesis(X, theta)
z = X * theta;
e_z = exp(-z);
denominator = 1.+e_z;
g_z = 1./denominator;
endfunction

# Step 2: Calculate Cost function
function cost = estimateCostFunction(hypothesis, Y)
log_1 = log(hypothesis);
log_2 = log(1.-hypothesis);

y1 = Y;
term_1 = y1.*log_1;

y2 = 1.-Y;
term_2 = y2.*log_2;

cost = term_1 + term_2;
cost = sum(cost);

# no.of.rows
m = size(Y, 1);
cost = -1 * (cost/m);

endfunction

# Step 3: Updating theta values
function updatedTheta = updateThetaValues(_X, _Y, _theta, _hypothesis, learningRate)
s1 = _hypothesis — _Y;

s2 = s1 .* _X;
s3 = sum(s2);

# no.of.rows
m = size(_Y, 1);

s4 = (learningRate * s3)/m;
updatedTheta = _theta .- s4';
endfunction

costVector = [];
iterationVector = [];
for i = 1:200000
# Step 1
hypothesis = estimateHypothesis(X, theta);

# Step 2
cost = estimateCostFunction(hypothesis, Y);
costVector = vertcat(costVector, cost);

# Step 3 — Updating theta values
theta = updateThetaValues(X, Y, theta, hypothesis, learningRate);

iterationVector = vertcat(iterationVector, i);
endfor

function plotGraph(iterationVector, costVector)
plot(iterationVector, costVector);
ylabel(‘Cost Function’);
xlabel(‘Iteration’);
endfunction

plotGraph(iterationVector, costVector);

#We have estimated the parameters. Now, let’s check our test data i.e., from the range 1001 to 1310 to predict whether the passengers are survived or not.

function evaluateTestdata(theta)
[Test, Tn, Ra, limits] = xlsread(“~/ML/ML Practice/dataset/train_and_test2.csv”, “Sheet2”, “A1001:H1310”);

# As per CSV file we are reading columns from 1 to 7. 8-th column is Survived, which is what we are going to predict
_X = [Test(:, [1:7])];

_X = horzcat(ones(size(_X,1), 1), _X);

_Y = estimateHypothesis(_X, theta);

# Calculating accuracy percentage of our model
cnt = 0;
for i = 1:size(_Y,1)
val = _Y(i,1);
if(val<0.5)
cnt = cnt + 1;
endif
endfor

disp(“Count”), disp(cnt);

percentage = (cnt * 100)/ size(_X,1);
disp(“Accuracy Percentage of the Model: “),disp(percentage);

endfunction

evaluateTestdata(theta);

Gradient descent graph:

Gradient Descent Graph

Thanks.

--

--

No responses yet