Predicting Survival in Titanic! Part 2

Photo by Boxed Water Is Better on Unsplash
  1. Feature transformation — to arrive at normal distribution
  2. Feature Engineering — addition of a few new features from information extracted from existing ones
  3. Cleaning and Imputation — dropped a few columns and filled some missing information

Getting Started — Part 2

Importing the data and taking a look at it.

  1. We have missed dropping the ‘Ticket’ column earlier, hence will be dropped now.
  2. The target variable ‘Survived’ is in between other features, and hence let us shift it to right most for ease of explanation and better understanding.
  3. The ‘Pclass’ feature shows as int64 and let us convert it to ‘object’ type for better result from the model
Changing the position of the target variable
changing the data type of the feature

Baseline Models

Since we have already established that this is going to be a classification problem, we will be importing only the classifiers from sklearn library along with all evaluation methods.

  1. Create validation data from the train data
  2. Feed the train data into the model
  3. Check the performance using validation data
  4. Store all evaluation scores in data frame for comparison
  5. Hyper-parameter tune the best baseline model
  6. Re-evaluate the the hyper-tuned model to check for improvements in score
  7. Feed the test data into the model for final prediction
an empty dataframe defined for storing the evaluation scores
empty dataframe for storing the predictions
  1. extracts the train and validation set using the function defined earlier
  2. fit the train set into the model
  3. generates predictions for validation as well as actual test set
  4. calculates the evaluation scores and feed them into the dataframe defined for the purpose
  5. plots AUC_ROC curve
  6. plots feature importance
  7. plots trees for tree based algorithms
matrix for logistic regression
AUC_ROC curve
visual representation of feature importance



Project Manager with 11 years of industry experience. Data Science enthusiast. Entrepreneur.