This repository is an eda project with titanic dataset.
Welcome to the Titanic Dataset Analysis repository! This project focuses on comprehensive data cleaning and exploratory data analysis (EDA) of the famous Titanic dataset from Kaggle. Through rigorous analysis and visualization techniques, I uncover meaningful patterns and insights about passenger survival factors.
On April 15, 1912, the largest passenger liner ever made at the time collided with an iceberg during her maiden voyage. When the Titanic sank, it killed 1,502 out of 2,224 passengers and crew.
The titanic.csv file contains data for 891 real Titanic passengers. Each row represents one person.
- Github: Soheil Abbasnia
- LinkedIn: Soheil Abbasnia
Titanic-EDA/
├── README.md
├── eda.ipynb
├── titanic.csv
- README.md: Project documentation and overview
- eda.ipynb: Jupyter notebook containing the analysis
- titanic.csv: Complete dataset. The columns describe different attributes about the person in the ship.
- PassengerId column is a unique ID of the passenger
- Survived is the number that survived (1) or died (0)
- Pclass is the passenger's class (that is, first, second, or third)
- Name is the passenger's name
- Gender is the passenger's gender
- Age is the passenger's age
- Siblings/Spouses Aboard is the number of siblings/spouses aboard the Titanic
- Parents/Children Aboard is the number of parents/children aboard the Titanic
- Ticket is the ticket number
- Fare is the fare for each ticket
- Cabin is the cabin number
- Embarked is where the passenger got on the ship (for instance: C refers to Cherbourg, S refers to Southampton, and Q refers to Queenstown)
- Handle missing values
- Explored correlations between variables
- Created comprehensive visualizations
- Advanced data preprocessing techniques
- Statistical analysis methods
- Data visualization best practices
- Python Libraries:
- pandas: Data manipulation
- numpy: Numerical operations
- matplotlib/seaborn: Visualization
- scikit-learn: Data preprocessing
- Tools:
- Jupyter Notebook
- Git/GitHub
- Python 3.x
- Jupyter Notebook
- Required Python libraries
-
Clone the repository:
git clone https://github.com/docRoch/eda-titanic.git
-
Install required packages:
pip install -r requirements.txt
-
Open the Jupyter Notebook:
jupyter notebook Titanic_Dataset_Analysis.ipynb
- Data Science
- Data Cleaning
- Exploratory Data Analysis (EDA)
- Titanic Dataset
- Python
- Statistical Analysis
- Data Visualization
If you find this project helpful, please give it a star! Feel free to fork it for your own data analysis journey.