Adventures in Machine Learning

Transforming Data with the apply() Function in Pandas DataFrame

Pandas is a powerful library designed for data analysis in Python. It is widely used by data scientists, analysts, and researchers globally because of its easy-to-use, fast, and flexible functions.

The apply() function is one of the popular functions in pandas used to transform data within a pandas DataFrame. This article will explain how to use the apply() function in pandas DataFrame to transform data and show other common functions of pandas.to the apply() function

Pandas DataFrame is nothing but a tabular data structure, like a spreadsheet or a database table.

It contains rows and columns of data that can be manipulated using several functions. The apply() function in pandas DataFrame enables you to apply a particular function to a single column or an entire DataFrame.

The syntax for using the apply() function to transform a DataFrame inplace

The apply() function can be used with different arguments, including the inplace argument and lambda function. The inplace argument specifies whether the transformation should be applied in-place on the DataFrame or return a new copy of the modified DataFrame.

The lambda function is an anonymous function that can be used to write concise code for one-time use. Example 1: Using apply() inplace for one column

Suppose you have a DataFrame containing monthly sales data.

You can use the apply() function to double the sales in a particular column as shown below:

“`

import pandas as pd

sales_data = pd.DataFrame({‘Month’: [‘Jan’, ‘Feb’, ‘Mar’, ‘Apr’],

‘Sales’: [100, 200, 150, 300]})

# Use apply() inplace to double sales in the ‘Sales’ column

sales_data[‘Sales’].apply(lambda x: x*2, inplace=True)

print(sales_data)

“`

The output will be:

“`

Month Sales

0 Jan 200

1 Feb 400

2 Mar 300

3 Apr 600

“`

Example 2: Using apply() inplace for multiple columns

To apply the apply() function to multiple columns of a DataFrame, use the apply() function with a lambda function that receives multiple values. In the example below, we will double the sales values for the ‘Sales_1’ and ‘Sales_2’ columns.

“`

import pandas as pd

sales_data = pd.DataFrame({‘Month’: [‘Jan’, ‘Feb’, ‘Mar’, ‘Apr’],

‘Sales_1’: [100, 200, 150, 300],

‘Sales_2’: [500, 400, 450, 300]})

multiplier = lambda x, y: (x*2, y*2)

# Use apply() inplace to double sales in the ‘Sales_1’ and ‘Sales_2’ columns

sales_data[[‘Sales_1’, ‘Sales_2’]] = sales_data[[‘Sales_1’, ‘Sales_2’]].apply(lambda row: multiplier(*list(row)), axis=1)

print(sales_data)

“`

The output will be:

“`

Month Sales_1 Sales_2

0 Jan 200 1000

1 Feb 400 800

2 Mar 300 900

3 Apr 600 600

“`

Example 3: Using apply() inplace for all columns

If you want to apply the apply() function to all columns of a DataFrame, you need to apply it to the DataFrame itself and not just a single column. In the example below, we will double the sales values for all columns:

“`

import pandas as pd

sales_data = pd.DataFrame({‘Month’: [‘Jan’, ‘Feb’, ‘Mar’, ‘Apr’],

‘Sales_1’: [100, 200, 150, 300],

‘Sales_2’: [500, 400, 450, 300]})

# Use apply() inplace to double sales in all columns

sales_data = sales_data.apply(lambda x: x*2, inplace=True)

print(sales_data)

“`

The output will be:

“`

Month Sales_1 Sales_2

0 NaN 200 1000

1 NaN 400 800

2 NaN 300 900

3 NaN 600 600

“`

As we can see, the Month column is NaN (Not a Number) because it contains string values, and you cannot do calculations with strings.

Other common functions in pandas

Drop() function

The drop() function in pandas DataFrame enables you to remove rows or columns that you don’t need. Here is an example code:

“`

import pandas as pd

sales_data = pd.DataFrame({‘Month’: [‘Jan’, ‘Feb’,’Mar’, ‘Apr’],

‘Sales’: [100,200,300,400],

‘Profit’: [50,100,150,200]})

# Use drop() inplace to remove the ‘Profit’ column

sales_data.drop([‘Profit’], axis = 1, inplace = True)

print(sales_data)

“`

The output will be:

“`

Month Sales

0 Jan 100

1 Feb 200

2 Mar 300

3 Apr 400

“`

Replace() function

The replace() function in pandas DataFrame enables you to replace values in the DataFrame. Here is an example code:

“`

import pandas as pd

sales_data = pd.DataFrame({‘Month’: [‘Jan’, ‘Feb’,’Mar’, ‘Apr’],

‘Sales’: [100,200,300,400]})

# Use replace() inplace to replace the value 400 by 450

sales_data.replace(400, 450, inplace=True)

print(sales_data)

“`

The output will be:

“`

Month Sales

0 Jan 100

1 Feb 200

2 Mar 300

3 Apr 450

“`

Conclusion

In conclusion, pandas provide powerful functions that can make data manipulation easier. The apply() function allows you to apply a function to a single column, multiple columns, and all columns in a DataFrame.

The drop() function enables you to remove rows or columns that you don’t need, while the replace() function enables you to replace values of a DataFrame. We hope this article has provided a better understanding of how these functions work and how to use them in pandas data analysis.

In conclusion, pandas is a powerful library that simplifies data manipulation in Python. The apply() function is a flexible and fast method for transforming data in a pandas DataFrame.

It enables you to alter a single column, multiple columns, or the entire DataFrame. Additionally, the drop() function eliminates unwanted columns or rows.

The replace() function is useful when you want to change values in a DataFrame. By understanding and implementing these functions, data scientists and analysts can refine their workflows and produce data-driven insights more efficiently.

Popular Posts