## Merging Multiple DataFrames in Pandas

Data science is becoming increasingly popular, and Python has proven to be one of the leading programming languages. One of the essential libraries for data manipulation with Python is Pandas, which allows users to work with both structured and unstructured data.

In this article, we will focus on merging multiple DataFrames in Pandas.

## 1) Syntax for Merging Multiple DataFrames

Merging multiple DataFrames is a way to combine multiple datasets that have some shared columns or indices. The `merge()`

function is used to join DataFrames based on one or more keys.

### 1.1) Merging Two DataFrames

Here is the syntax for merging two DataFrames:

`merged_df = pd.merge(df1, df2, on='key column')`

Where `df1`

and `df2`

are the DataFrames to be merged, while `'key column'`

is the column that both DataFrames have in common. You can also merge on multiple keys:

`merged_df = pd.merge(df1, df2, on=['key column1', 'key column2'])`

The merged DataFrame contains all columns from both DataFrames, and the rows are combined based on the shared key(s).

The `merge()`

function has several parameters that can be used to customize the merging process.

### 1.2) Example of Merging Multiple DataFrames

Let’s consider an example where we have two DataFrames with customer data and order data. Both DataFrames have a shared column ‘Customer ID’:

```
customers = {'Customer ID': [1, 2, 3, 4, 5],
'First name': ['John', 'Jane', 'Mike', 'Alice', 'Bob'],
'Last name': ['Doe', 'Doe', 'Smith', 'Johnson', 'Lee']}
orders = {'Customer ID': [1, 1, 2, 3, 5],
'Product': ['A', 'B', 'C', 'D', 'E'],
'Quantity': [2, 1, 3, 2, 2]}
df_customers = pd.DataFrame(customers)
df_orders = pd.DataFrame(orders)
```

We can merge the two DataFrames based on the ‘Customer ID’ column:

`merged_df = pd.merge(df_customers, df_orders, on='Customer ID')`

The resulting DataFrame will have all columns from both DataFrames and only rows where the ‘Customer ID’ exists in both DataFrames:

```
Customer ID First name Last name Product Quantity
0 1 John Doe A 2
1 1 John Doe B 1
2 2 Jane Doe C 3
3 3 Mike Smith D 2
4 5 Bob Lee E 2
```

Using the `merge()`

function, we have combined the customer data and order data into one DataFrame based on the shared ‘Customer ID’ column.

## 2) Using the `reduce()`

Function in Python

The `reduce()`

function is a powerful built-in function in Python that is used to apply a function iteratively over a sequence. The `reduce()`

function is part of the `functools`

module and is used to reduce a sequence of elements to a single value.

### 2.1) Importing the `reduce()`

Function

The first step in using the `reduce()`

function is to import it from the `functools`

module:

`from functools import reduce`

This allows us to use the `reduce()`

function in our code.

### 2.2) Example of Using the `reduce()`

Function

Let’s consider an example where we want to calculate the product of all the elements in a list. We can use the `reduce()`

function along with the `lambda`

function, like this:

```
lst = [2, 4, 6, 8]
product = reduce(lambda x, y: x*y, lst)
print(product)
```

#### Output:

`384`

In this example, we have used the `reduce()`

function to apply the `lambda`

function iteratively over the list, multiplying each element together. The `reduce()`

function reduces the sequence of elements to a single value, which is the product of all the elements in the list.

Another example of using the `reduce()`

function is to find the maximum element in a list:

```
lst = [5, 2,
7, 1, 6]
max_val = reduce(lambda x, y: x if x > y else y, lst)
print(max_val)
```

#### Output:

`7`

In this example, we have used the `reduce()`

function to apply the `lambda`

function iteratively over the list, comparing each element to find the maximum value.

## 3) Filling Empty Cells in Merged DataFrames

When merging multiple DataFrames, there may be cases where some cells are empty. This can happen when there is no corresponding data for a certain row or column.

In Pandas, empty cells are represented by `NaN`

, which stands for “Not a Number.” In this section, we will discuss how to fill empty cells in merged DataFrames.

### 3.1) Using `NaN`

to Fill Empty Cells

`NaN`

can be used to fill empty cells in a DataFrame using the `fillna()`

method. The `fillna()`

method can be used to replace `NaN`

values with a specified value or method.

Let’s consider an example where we have merged two DataFrames with missing values:

```
import pandas as pd
import numpy as np
df1 = pd.DataFrame({'A': [1, 2, np.nan],
'B': [4, np.nan, np.nan],
'C': ['x', 'y', 'z']})
df2 = pd.DataFrame({'A': [np.nan, 4, 5],
'B': [np.nan, 6,
7],
'C': ['x', 'y', 'z']})
merged_df = pd.merge(df1, df2, on='C')
```

#### The resulting DataFrame will have NaN values in the cells where the corresponding values are missing:

```
A_x B_x C A_y B_y
0 1.0 4.0 x NaN NaN
1 2.0 NaN y 4.0 6.0
2 NaN NaN z 5.0
7.0
```

To fill the `NaN`

values with a value, we can use the `fillna()`

method, like this:

`merged_df.fillna(0, inplace=True)`

The `fillna()`

method will replace all `NaN`

values with 0. The resulting DataFrame will be:

```
A_x B_x C A_y B_y
0 1.0 4.0 x 0.0 0.0
1 2.0 0.0 y 4.0 6.0
2 0.0 0.0 z 5.0
7.0
```

### 3.2) Using Other Values to Fill Empty Cells

Instead of using `NaN`

to fill empty cells, we can use other values based on the context of the data. Let’s consider the same example we used in subtopic 3.1, where we have merged two DataFrames with missing values:

```
import pandas as pd
import numpy as np
df1 = pd.DataFrame({'A': [1, 2, np.nan],
'B': [4, np.nan, np.nan],
'C': ['x', 'y', 'z']})
df2 = pd.DataFrame({'A': [np.nan, 4, 5],
'B': [np.nan, 6,
7],
'C': ['x', 'y', 'z']})
merged_df = pd.merge(df1, df2, on='C')
```

To fill the `NaN`

values with values that make sense, we can use the `fillna()`

method with a dictionary of column names and their corresponding fill values, like this:

```
fill_values = {'A_x': df1['A'].mean(), 'B_x': 0, 'A_y': df2['A'].median(), 'B_y': df2['B'].median()}
merged_df.fillna(value=fill_values, inplace=True)
```

In this example, we have used the mean value of column ‘A’ in `df1`

, 0 for column ‘B’ in `df1`

, and the median values of columns ‘A’ and ‘B’ in `df2`

to fill the `NaN`

values. The resulting DataFrame will be:

```
A_x B_x C A_y B_y
0 1.0 4.0 x 4.5 6.5
1 2.0 0.0 y 4.0 6.0
2 1.5 0.0 z 5.0
7.0
```

## 4) Additional Resources for Pandas Operations

Pandas provides a variety of operations for data manipulation, including filtering, sorting, grouping, and aggregating. In this section, we will discuss some common operations in Pandas and provide additional resources for learning.

### 4.1) Common Operations in Pandas

- Filtering: Filtering is a way to create a subset of a DataFrame based on certain conditions. The most common way to filter a DataFrame is to use boolean indexing, which creates a series of True and False values for each element in the DataFrame.
- Sorting: Sorting is a way to rearrange the rows of a DataFrame based on the values in one or more columns.
- Grouping: Grouping is a way to group rows of a DataFrame based on the values in one or more columns, and then apply a function to each group.
- Aggregating: Aggregating is a way to apply a function to a group of rows and return a single value.

#### Example:

```
import pandas as pd
df = pd.read_csv('data.csv')
filtered_df = df[df['column_name'] > value]
```

#### Example:

```
import pandas as pd
df = pd.read_csv('data.csv')
sorted_df = df.sort_values(by='column_name')
```

#### Example:

```
import pandas as pd
df = pd.read_csv('data.csv')
grouped_df = df.groupby('column_name').mean()
```

#### Example:

```
import pandas as pd
df = pd.read_csv('data.csv')
aggregated_df = df['column_name'].agg(['mean', 'max', 'min'])
```

### 4.2) Additional Resources for Pandas Operations

Pandas has extensive documentation on their website, including user guides, API reference, and tutorials. In addition, there are many online resources that can help users learn Pandas, including:

- Pandas Cookbook: A comprehensive guide to manipulating and analysing data with Pandas
- Python for Data Analysis: A book by Wes McKinney, the creator of Pandas, that covers Pandas and other data analysis tools in Python
- Kaggle: A website with many datasets and tutorials for data analysis in Python using Pandas

## Conclusion

In this article, we have covered various topics related to working with data in Pandas, including merging DataFrames, filling empty cells, and common Pandas operations. We hope that you found this article informative and helpful in your data science journey.

This article covered several crucial aspects of data management with Pandas, including merging multiple DataFrames, filling empty cells, and common operations available for Pandas users. The article has emphasized the importance of each of these topics in data management and analysis.

Pandas offers a powerful interface for organizing, cleaning, manipulating, and analyzing data, making it a valuable tool for data enthusiasts and professionals alike. By learning about Merging DataFrames, Filling Empty Cells, and Common Pandas Operations, data analysts and researchers will be well-equipped to manage and manipulate complex datasets.

We hope this article was informative and helpful in your data science ventures.