---
jupytext:
  text_representation:
    extension: .md
    format_name: myst
    format_version: 0.13
    jupytext_version: 1.14.1
kernelspec:
  display_name: Python 3 (ipykernel)
  language: python
  name: python3
---

# Introduction to Pandas

In this notebook, we will explore how to use the Pandas library to manipulate and analyze data. Pandas is an essential tool in any data science project. You will learn how to load data, explore it, and manipulate it effectively using `DataFrame`.

## Objectives

- Learn how to load data into a `DataFrame`
- Understand the difference between indices and indexes
- Manipulate `DataFrame` (selection, adding/removing columns, filtering)
- Apply simple operations on data (averages, groupings)
- Save results in different formats

## 1. Import Pandas and Load Data

```{code-cell} ipython3
import pandas as pd

# Load a sample dataset
url = "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/tips.csv"
data = pd.read_csv(url)

# Show the first few rows of the DataFrame
data.head()
```

We have loaded a dataset with `pd.read_csv()` from a URL. This function can also be used to load local files (e.g., `.csv`, `.txt`).

## 2. Manipulating `DataFrame`

### 2.1 Structure of a `DataFrame`
A `DataFrame` is a table of data with rows and columns. Each column has a name, and each row is identified by an index.

```{code-cell} ipython3
# Display information about the DataFrame
data.info()
```

### 2.2 Selecting Columns and Rows

You can access specific columns or rows by using column names or indexes.

```{code-cell} ipython3
# Select a column
data['total_bill'].head()

# Select multiple columns
data[['total_bill', 'tip']].head()

# Select a row (by index)
data.loc[0]

# Select multiple rows and columns
data.loc[0:5, ['total_bill', 'tip']]
```

### 2.3 Index vs Indices

The index in a `DataFrame` is the set of labels identifying each row, and it does not necessarily correspond to the row indices (which are just integers from 0 to N).

```{code-cell} ipython3
# Display the current index
data.index

# Change the DataFrame index
data.set_index('day', inplace=True)

# Check the new index
data.head()

# Reset the index to revert to default indices
data.reset_index(inplace=True)
```

## 3. Data Operations

### 3.1 Filtering and Conditions

We can filter rows based on specific conditions.

```{code-cell} ipython3
# Filter rows where the total bill is greater than 20
high_total_bill = data[data['total_bill'] > 20]
high_total_bill.head()
```

### 3.2 Adding or Removing Columns

You can easily add new columns derived from existing columns or remove unnecessary columns.

```{code-cell} ipython3
# Add a calculated column
data['tip_percent'] = data['tip'] / data['total_bill'] * 100

# Remove a column
data.drop(columns=['sex'], inplace=True)

data.head()
```

## 4. Descriptive Analysis and Grouping

### 4.1 Descriptive Statistics

Pandas makes it easy to get descriptive statistics on your data.

```{code-cell} ipython3
# Get descriptive statistics
data.describe()
```

### 4.2 Data Grouping

It is often useful to group data by one or more columns and apply computation functions.

```{code-cell} ipython3
# Compute the average tips by day
data.groupby('day')['tip'].mean()

# Count occurrences by day
data['day'].value_counts()
```

## 5. Saving Results

Once your data is manipulated, you can easily save it in different formats (CSV, Excel, etc.).

```{code-cell} ipython3
# Save to CSV
data.to_csv('modified_data.csv', index=False)

# Save to Excel
data.to_excel('modified_data.xlsx', index=False)
```

## Conclusion

This notebook has introduced you to the basics of Pandas. You have learned how to:
- Load data into a `DataFrame`
- Manipulate data (selection, adding columns)
- Perform simple analyses (grouping, calculations)
- Save results

Pandas is a powerful tool that will allow you to explore and analyze your data efficiently.
