Predicting Survival in Titanic! Part 2

Photo by Boxed Water Is Better on Unsplash
  1. Feature transformation — to arrive at normal distribution
  2. Feature Engineering — addition of a few new features from information extracted from existing ones
  3. Cleaning and Imputation — dropped a few columns and filled some missing information

Getting Started — Part 2

  1. We have missed dropping the ‘Ticket’ column earlier, hence will be dropped now.
  2. The target variable ‘Survived’ is in between other features, and hence let us shift it to right most for ease of explanation and better understanding.
  3. The ‘Pclass’ feature shows as int64 and let us convert it to ‘object’ type for better result from the model
Changing the position of the target variable
changing the data type of the feature

Baseline Models

  1. Create validation data from the train data
  2. Feed the train data into the model
  3. Check the performance using validation data
  4. Store all evaluation scores in data frame for comparison
  5. Hyper-parameter tune the best baseline model
  6. Re-evaluate the the hyper-tuned model to check for improvements in score
  7. Feed the test data into the model for final prediction
an empty dataframe defined for storing the evaluation scores
empty dataframe for storing the predictions
  1. extracts the train and validation set using the function defined earlier
  2. fit the train set into the model
  3. generates predictions for validation as well as actual test set
  4. calculates the evaluation scores and feed them into the dataframe defined for the purpose
  5. plots AUC_ROC curve
  6. plots feature importance
  7. plots trees for tree based algorithms
matrix for logistic regression
AUC_ROC curve
visual representation of feature importance




Project Manager with 11 years of industry experience. Data Science enthusiast. Entrepreneur.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

SVM— Support Vector Machine 🤖

What in the wordle?

A New Approach to Anomaly Detection

Better Data Science Code Without Being a Code Quality Extremist

The Data Science Radar

Underfitting vs. Overfitting (vs. Best Fitting) in Machine Learning

How can we leverage ML to predict NBA Championship odds?

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Aditya Vyas

Aditya Vyas

Project Manager with 11 years of industry experience. Data Science enthusiast. Entrepreneur.

More from Medium

Data x Categorical Variables

Titanic Disaster Survival Prediction With ROC AUC Score 90+ And CRISP-DM Methodology

Deep Understanding Of Data Preprocessing