# Maximizing Insights: Calculating Rolling Maximums with Pandas DataFrame

Have you ever had to work with financial or sales data that required analysis over a particular time frame? If yes, then you may know how challenging it can be to determine the maximum value over a rolling window, especially when dealing with large datasets where manual calculations can be prone to errors.

Luckily, with pandas DataFrame, calculating rolling maximums has become much easier, faster, and more accurate. In this article, we will explore two methods for calculating rolling maximums in pandas DataFrame and provide an example demonstrating how to use these methods in action.

Method 1: Calculate Rolling Maximum

The first method involves using the cummax() function, which calculates the cumulative maximum over a rolling window. This method works well when you need to calculate the maximum value over a fixed number of preceding rows.

For example, let’s create a pandas DataFrame that simulates sales information over ten days.

“`

## import pandas as pd

df = pd.DataFrame({‘day’: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],

‘sales’: [10, 15, 8, 20, 12, 18, 14, 26, 22, 19]})

“`

“`

day sales

## 9 10 19

“`

Now let’s add a column that calculates the rolling maximum value over the previous three rows.

“`

df[‘roll_max’] = df[‘sales’].cummax().shift(1)

df[‘roll_max’][0:2] = df[‘sales’][0:2]

“`

## Output:

“`

day sales roll_max

## 9 10 19 26

“`

In this example, the cummax() function calculates the cumulative maximum value of the sales data over the entire DataFrame. Then, we use the shift function to shift the rolling maximum value by one row to the right.

Finally, we replace the first two values with the actual values as they do not have a previous value to shift into the rolling window. Method 2: Calculate Rolling Maximum by Group

The second method involves using the groupby() function and calculating the rolling maximum within each group.

This method is useful when you need to compute rolling maximums over subsets of the data.

For example, let’s create a pandas DataFrame that simulates sales information for different stores.

## DataFrame Creation

“`

df = pd.DataFrame({‘store’: [‘A’, ‘A’, ‘A’, ‘B’, ‘B’, ‘B’, ‘C’, ‘C’, ‘C’],

‘day’: [1, 2, 3, 1, 2, 3, 1, 2, 3],

‘sales’: [10, 15, 8, 20, 12, 18, 14, 26, 22]})

“`

“`

store day sales

## 8 C 3 22

“`

Now let’s add a column that calculates the rolling maximum sales value for each store over the previous two days. “`

df[‘roll_max’] = df.groupby(‘store’)[‘sales’].rolling(2).max().reset_index(0, drop=True)

“`

## Output:

“`

store day sales roll_max

1 A 2 15 15.0

2 A 3 8 15.0

4 B 2 12 20.0

5 B 3 18 18.0

## 6 C 1 14 NaN

7 C 2 26 26.0

8 C 3 22 26.0

“`

In this example, the groupby() function groups the sales data by store and applies the rolling() function to each group to compute the rolling maximum value over the previous two days. Finally, we use reset_index() to flatten the DataFrame and drop any unnecessary columns.

## Conclusion

In conclusion, calculating rolling maximums with pandas DataFrame is a straightforward and efficient process. In this article, we explored two methods for computing rolling maximums: using the cummax() function to calculate a cumulative maximum over a rolling window and using the groupby() function to group and compute the maximum over subsets of data.

By utilizing pandas DataFrame’s robust functionality, you can easily analyze your data and obtain valuable insights from it. In the previous section, we discussed two methods for calculating rolling maximums using pandas DataFrame.

In this section, we will delve deeper into the second method and provide an example that demonstrates the application of rolling maximums by group. Example 2: Calculate Rolling Maximum by Group

## DataFrame Creation with Multiple Stores

Let’s assume you are analyzing sales data for different stores over a period of ten days. The dataset contains information on each store’s daily sales volume.

“`

## import pandas as pd

sales_data = {‘store’: [‘Store A’, ‘Store B’, ‘Store A’, ‘Store B’, ‘Store A’, ‘Store B’, ‘Store A’, ‘Store B’, ‘Store A’, ‘Store B’],

‘day’: [‘Day 1’, ‘Day 1’, ‘Day 2’, ‘Day 2’, ‘Day 3’, ‘Day 3’, ‘Day 4’, ‘Day 4’, ‘Day 5’, ‘Day 5’],

‘sales’: [900, 1200, 1100, 1300, 1500, 1600, 1400, 1200, 1000, 2000]}

df = pd.DataFrame(sales_data)

“`

“`

store day sales

## 9 Store B Day 5 2000

“`

The DataFrame contains the store name, the day of the sale, and the sales volume for that store on that day.

## Adding Rolling Maximum Column Grouped by Store

Now, let’s calculate the rolling maximum sales value for each store over the previous two days. “`

df[‘rolling_max’] = df.groupby(‘store’)[‘sales’].apply(lambda x: x.shift(1).rolling(2).apply(lambda y: max(y))).fillna(method=’backfill’)

“`

## Output:

“`

store day sales rolling_max

## 1 Store B Day 1 1200 NaN

2 Store A Day 2 1100 900.0

3 Store B Day 2 1300 1200.0

4 Store A Day 3 1500 1100.0

5 Store B Day 3 1600 1300.0

6 Store A Day 4 1400 1500.0

7 Store B Day 4 1200 1600.0

8 Store A Day 5 1000 1400.0

9 Store B Day 5 2000 1200.0

“`

In this example, we applied the groupby() function on the store column and then used the apply() method to calculate the cumulative maximum sales volume. The shift() method is used to shift the sales volume by one row in the time series, which is then used in the rolling() function to calculate the maximum sales volume over the previous two days.

Finally, we used the fillna(method=’backfill’) method to fill missing values with the next available value.