Skip to content

docroch/titanic-eda

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 

Repository files navigation

Titanic EDA

This repository is an eda project with titanic dataset.

Project Overview

Welcome to the Titanic Dataset Analysis repository! This project focuses on comprehensive data cleaning and exploratory data analysis (EDA) of the famous Titanic dataset from Kaggle. Through rigorous analysis and visualization techniques, I uncover meaningful patterns and insights about passenger survival factors.
On April 15, 1912, the largest passenger liner ever made at the time collided with an iceberg during her maiden voyage. When the Titanic sank, it killed 1,502 out of 2,224 passengers and crew.
The titanic.csv file contains data for 891 real Titanic passengers. Each row represents one person.

Author

File Structure

Titanic-EDA/
├── README.md
├── eda.ipynb
├── titanic.csv

File Descriptions

  1. README.md: Project documentation and overview
  2. eda.ipynb: Jupyter notebook containing the analysis
  3. titanic.csv: Complete dataset. The columns describe different attributes about the person in the ship.
    • PassengerId column is a unique ID of the passenger
    • Survived is the number that survived (1) or died (0)
    • Pclass is the passenger's class (that is, first, second, or third)
    • Name is the passenger's name
    • Gender is the passenger's gender
    • Age is the passenger's age
    • Siblings/Spouses Aboard is the number of siblings/spouses aboard the Titanic
    • Parents/Children Aboard is the number of parents/children aboard the Titanic
    • Ticket is the ticket number
    • Fare is the fare for each ticket
    • Cabin is the cabin number
    • Embarked is where the passenger got on the ship (for instance: C refers to Cherbourg, S refers to Southampton, and Q refers to Queenstown)

Key Insights

  • Handle missing values
  • Explored correlations between variables
  • Created comprehensive visualizations

Knowledge Gained

  • Advanced data preprocessing techniques
  • Statistical analysis methods
  • Data visualization best practices

Technical Stack

  • Python Libraries:
    • pandas: Data manipulation
    • numpy: Numerical operations
    • matplotlib/seaborn: Visualization
    • scikit-learn: Data preprocessing
  • Tools:
    • Jupyter Notebook
    • Git/GitHub

Getting Started

Prerequisites

  • Python 3.x
  • Jupyter Notebook
  • Required Python libraries

Installation

  1. Clone the repository:

    git clone https://github.com/docRoch/eda-titanic.git
  2. Install required packages:

    pip install -r requirements.txt
  3. Open the Jupyter Notebook:

    jupyter notebook Titanic_Dataset_Analysis.ipynb

Keywords

  • Data Science
  • Data Cleaning
  • Exploratory Data Analysis (EDA)
  • Titanic Dataset
  • Python
  • Statistical Analysis
  • Data Visualization

Support

If you find this project helpful, please give it a star! Feel free to fork it for your own data analysis journey.

About

EDA on Titanic dataset

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors