Titanic Survivors prediction using Logistic Regression with Gradient Descent
Hi,
Good day!
Please, checkout the program for predicting titanic survivors which is written in Octave language. Accuracy rate of the model: 68%
Find the dataset in below the following link: Titanic survivors dataset. Please, remove the features with “zero” in the dataset.
pkg load io;
#{
Program: Predicting titianic survivours using logistic regression with gradient descent
Accuracy rate: 68%
Author: Shankar Muthusami
Created on: 24/Mar/2019Training set range: 1 to 1000
Test set range: 1000 to 1310
#}[An, Tn, Ra, limits] = xlsread(“~/ML/ML Practice/dataset/train_and_test2.csv”, “Sheet2”, “A2:H1000”);
# As per CSV file we are reading columns from 1 to 7. 8-th column is Survived, which is what we are going to predict
X = [An(:, [1:7])];Y = [An(:, 8)];
X = horzcat(ones(size(X,1), 1), X);
# Initializing theta values as zero for all
theta = zeros(size(X,2),1);#Initializing learning rate to 0.003
learningRate = 0.003;# Step 1: Calculate Hypothesis
function g_z = estimateHypothesis(X, theta)
z = X * theta;
e_z = exp(-z);
denominator = 1.+e_z;
g_z = 1./denominator;
endfunction# Step 2: Calculate Cost function
function cost = estimateCostFunction(hypothesis, Y)
log_1 = log(hypothesis);
log_2 = log(1.-hypothesis);
y1 = Y;
term_1 = y1.*log_1;
y2 = 1.-Y;
term_2 = y2.*log_2;
cost = term_1 + term_2;
cost = sum(cost);
# no.of.rows
m = size(Y, 1);
cost = -1 * (cost/m);
endfunction# Step 3: Updating theta values
function updatedTheta = updateThetaValues(_X, _Y, _theta, _hypothesis, learningRate)
s1 = _hypothesis — _Y;
s2 = s1 .* _X;
s3 = sum(s2);
# no.of.rows
m = size(_Y, 1);
s4 = (learningRate * s3)/m;
updatedTheta = _theta .- s4';
endfunctioncostVector = [];
iterationVector = [];
for i = 1:200000
# Step 1
hypothesis = estimateHypothesis(X, theta);
# Step 2
cost = estimateCostFunction(hypothesis, Y);
costVector = vertcat(costVector, cost);
# Step 3 — Updating theta values
theta = updateThetaValues(X, Y, theta, hypothesis, learningRate);
iterationVector = vertcat(iterationVector, i);
endforfunction plotGraph(iterationVector, costVector)
plot(iterationVector, costVector);
ylabel(‘Cost Function’);
xlabel(‘Iteration’);
endfunctionplotGraph(iterationVector, costVector);
#We have estimated the parameters. Now, let’s check our test data i.e., from the range 1001 to 1310 to predict whether the passengers are survived or not.
function evaluateTestdata(theta)
[Test, Tn, Ra, limits] = xlsread(“~/ML/ML Practice/dataset/train_and_test2.csv”, “Sheet2”, “A1001:H1310”);# As per CSV file we are reading columns from 1 to 7. 8-th column is Survived, which is what we are going to predict
_X = [Test(:, [1:7])];_X = horzcat(ones(size(_X,1), 1), _X);
_Y = estimateHypothesis(_X, theta);
# Calculating accuracy percentage of our model
cnt = 0;
for i = 1:size(_Y,1)
val = _Y(i,1);
if(val<0.5)
cnt = cnt + 1;
endif
endfor
disp(“Count”), disp(cnt);
percentage = (cnt * 100)/ size(_X,1);
disp(“Accuracy Percentage of the Model: “),disp(percentage);
endfunctionevaluateTestdata(theta);
Gradient descent graph:
Thanks.