Titanic EDA

This repository is an eda project with titanic dataset.

Project Overview

Welcome to the Titanic Dataset Analysis repository! This project focuses on comprehensive data cleaning and exploratory data analysis (EDA) of the famous Titanic dataset from Kaggle. Through rigorous analysis and visualization techniques, I uncover meaningful patterns and insights about passenger survival factors.
On April 15, 1912, the largest passenger liner ever made at the time collided with an iceberg during her maiden voyage. When the Titanic sank, it killed 1,502 out of 2,224 passengers and crew.
The titanic.csv file contains data for 891 real Titanic passengers. Each row represents one person.

Author

Github: Soheil Abbasnia
LinkedIn: Soheil Abbasnia

File Structure

Titanic-EDA/
├── README.md
├── eda.ipynb
├── titanic.csv

File Descriptions

README.md: Project documentation and overview
eda.ipynb: Jupyter notebook containing the analysis
titanic.csv: Complete dataset. The columns describe different attributes about the person in the ship.
- PassengerId column is a unique ID of the passenger
- Survived is the number that survived (1) or died (0)
- Pclass is the passenger's class (that is, first, second, or third)
- Name is the passenger's name
- Gender is the passenger's gender
- Age is the passenger's age
- Siblings/Spouses Aboard is the number of siblings/spouses aboard the Titanic
- Parents/Children Aboard is the number of parents/children aboard the Titanic
- Ticket is the ticket number
- Fare is the fare for each ticket
- Cabin is the cabin number
- Embarked is where the passenger got on the ship (for instance: C refers to Cherbourg, S refers to Southampton, and Q refers to Queenstown)

Key Insights

Handle missing values
Explored correlations between variables
Created comprehensive visualizations

Knowledge Gained

Advanced data preprocessing techniques
Statistical analysis methods
Data visualization best practices

Technical Stack

Python Libraries:
- pandas: Data manipulation
- numpy: Numerical operations
- matplotlib/seaborn: Visualization
- scikit-learn: Data preprocessing
Tools:
- Jupyter Notebook
- Git/GitHub

Getting Started

Prerequisites

Python 3.x
Jupyter Notebook
Required Python libraries

Installation

Clone the repository:

git clone https://github.com/docRoch/eda-titanic.git

Install required packages:
```
pip install -r requirements.txt
```

Open the Jupyter Notebook:

jupyter notebook Titanic_Dataset_Analysis.ipynb

Keywords

Data Science
Data Cleaning
Exploratory Data Analysis (EDA)
Titanic Dataset
Python
Statistical Analysis
Data Visualization

Support

If you find this project helpful, please give it a star! Feel free to fork it for your own data analysis journey.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Titanic EDA

Project Overview

Author

File Structure

File Descriptions

Key Insights

Knowledge Gained

Technical Stack

Getting Started

Prerequisites

Installation

Keywords

Support

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
README.md		README.md
eda.ipynb		eda.ipynb
titanic.csv		titanic.csv

Folders and files

Latest commit

History

Repository files navigation

Titanic EDA

Project Overview

Author

File Structure

File Descriptions

Key Insights

Knowledge Gained

Technical Stack

Getting Started

Prerequisites

Installation

Keywords

Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages