Adventures in Machine Learning

Extracting Quarter Information from Pandas DataFrames Made Easy

Extracting Quarter Information from a Date Column in Pandas

Do you have a date column in your Pandas DataFrame and want to extract the quarter information? No problem! In this article, we will explore two methods to extract the quarter from a date column in a Pandas DataFrame.

Method 1: Get Quarter from Date (Year & Quarter Format)

The first method involves extracting the year and quarter information and combining them into a new column in a year and quarter format. We’ll be using the pd.PeriodIndex function with the freq='Q' parameter to do this.

Let’s say we have a DataFrame containing sales data with a “date” column:

import pandas as pd
df = pd.DataFrame({'date': ['2019-01-01', '2019-04-01', '2019-07-01', '2019-10-01']})

To get the quarter information in a year and quarter format, we can create a new column called “quarter” using the following code:

df['quarter'] = pd.PeriodIndex(df['date'], freq='Q')

The result will be a new column containing the year and quarter information in the format “2019Q1”, “2019Q2”, etc.

Method 2: Get Quarter from Date (Quarter Number Format)

The second method involves extracting the quarter number directly from the date column using the dt.quarter function.

Let’s say we have the same DataFrame as before:

import pandas as pd
df = pd.DataFrame({'date': ['2019-01-01', '2019-04-01', '2019-07-01', '2019-10-01']})

To get the quarter information as a number, we can create a new column called “quarter” using the following code:

df['quarter'] = df['date'].dt.quarter

The result will be a new column containing the quarter number for each date.

Example 1: Get Quarter from Date (Year & Quarter Format)

Let’s go back to Method 1 and see how we can create a new column for the year and quarter format.

We’ll use the same DataFrame as before:

import pandas as pd
df = pd.DataFrame({'date': ['2019-01-01', '2019-04-01', '2019-07-01', '2019-10-01']})

To create a new column for the year and quarter format, we can use the pd.PeriodIndex function with the freq='Q' parameter:

df['quarter'] = pd.PeriodIndex(df['date'], freq='Q')

The resulting DataFrame will look like this:

         date quarter
0  2019-01-01  2019Q1
1  2019-04-01  2019Q2
2  2019-07-01  2019Q3
3  2019-10-01  2019Q4

The “quarter” column now contains the year and quarter information in the format “YYYYQX”.

Method 2: Get Quarter from Date (Quarter Number Format) – In Depth

As a reminder, the second method involves extracting the quarter number directly from the date column using the dt.quarter function.

Let’s dive deeper into how this works. To use this method, you’ll need a DataFrame containing a date column.

Let’s create a simple example:

import pandas as pd
df = pd.DataFrame({
    'date': ['2021-01-01', '2021-04-01', '2021-07-01', '2021-10-01']
})

Now let’s create a new column “quarter” that contains the quarter information as a number:

df['quarter'] = df['date'].dt.quarter

This will create a new column containing the quarter number for each date:

         date  quarter
0  2021-01-01        1
1  2021-04-01        2
2  2021-07-01        3
3  2021-10-01        4

That’s it – it’s that simple! You now have a new column containing the quarter information in a quarter number format.

Example 2: Get Quarter from Date (Quarter Number Format)

Let’s use a more practical example to demonstrate how to extract the quarter from a date column.

Suppose you have a DataFrame containing sales data for a company, and you want to see how sales vary by quarter. You have a “date” column containing the date of the sale, and a “sales” column containing the amount of the sale:

import pandas as pd
df = pd.DataFrame({
    'date': ['2021-01-01', '2021-01-15', '2021-02-01', '2021-03-01', '2021-04-01', '2021-07-01'],
    'sales': [100, 200, 150, 300, 250, 400]
})

To extract the quarter information as a number, we can use the dt.quarter function to create a new column “quarter”:

df['quarter'] = df['date'].dt.quarter

This will create a new column “quarter” containing the quarter number for each sale:

         date  sales  quarter
0  2021-01-01    100        1
1  2021-01-15    200        1
2  2021-02-01    150        1
3  2021-03-01    300        1
4  2021-04-01    250        2
5  2021-07-01    400        3

Now we can group the data by quarter and sum the sales to see how sales vary by quarter:

sales_by_quarter = df.groupby('quarter')['sales'].sum()

print(sales_by_quarter)

This will output:

quarter
1     750
2     250
3     400
Name: sales, dtype: int64

We can see that sales were highest in Q1 (750), followed by Q3 (400), and then Q2 (250).

Conclusion

Extracting quarter information from a date column in a Pandas DataFrame is easy with the two methods we’ve discussed. Whether you prefer the year and quarter format or the quarter number format, there’s a method for you.

By using the pd.PeriodIndex function or the dt.quarter function, you can create a new column containing the quarter information for each date in your DataFrame. Happy coding!

Additional Resources

With these resources, you’ll be well on your way to mastering Pandas and performing any operation you need on your data.

Popular Posts