# Mastering Frequency Counting in Pandas for Data Analysis

## Counting Frequency of Unique Values in Pandas Series

Pandas is one of the most popular data manipulation libraries in Python. It provides easy-to-use tools for data analysis, including functions for counting the frequency of unique values in a pandas series.

In this article, we will explore how to use these functions to count the frequency of unique values, NaN values, relative frequency, frequency in equal-sized bins, and frequency of values in pandas dataframes.

## Using `value_counts()` Function to Count Frequency

In pandas, the `value_counts()` function is used to count the frequency of unique values in a series. For instance, consider a pandas series with the following data:

``````import pandas as pd
data = pd.Series([3, 4, 5, 2, 4, 2, 6, 7, 3, 5, 6])``````

To count the frequency of unique values in the series, we can use the `value_counts()` function as follows:

``````freq = data.value_counts()
print(freq)``````

### Output:

``````4    2
3    2
5    2
6    2
2    2
7    1
dtype: int64``````

The output shows the frequency of unique values in descending order.

In this case, the value 4, 3, 5, 6, and 2 occur twice, and 7 occurs once.

## Counting Frequency of NaN Values using `dropna` Argument

NaN (Not a Number) values are used in pandas to represent missing data. To count the frequency of NaN values in a pandas series, we can use the `dropna` argument of the `value_counts()` function.

The `dropna` argument removes all NaN values from the series before counting its unique values. For instance, consider the following series with NaN values:

``data = pd.Series([3, 4, 5, 2, 4, 2, 6, 7, 3, 5, 6, None, None, None])``

To count the frequency of non-NaN values, we can use the following code:

``````freq = data.value_counts(dropna=True)
print(freq)``````

### Output:

``````4.0    2
3.0    2
5.0    2
6.0    2
2.0    2
7.0    1
dtype: int64``````

## Counting Relative Frequency using `normalize` Argument

The `normalize` argument of the `value_counts()` function can be used to calculate the relative frequency of unique values in a pandas series. The `normalize` argument accepts a boolean value, where `True` means that the counts will be normalized to represent the relative frequency, and `False` means that the counts will represent the absolute frequency.

For instance, consider the following series:

``data = pd.Series([3, 4, 5, 2, 4, 2, 6, 7, 3, 5, 6])``

To calculate the relative frequency of non-NaN values, we can use the following code:

``````freq = data.value_counts(normalize=True)
print(freq)``````

### Output:

``````4    0.181818
3    0.181818
5    0.181818
6    0.181818
2    0.181818
7    0.090909
dtype: float64``````

The output shows the relative frequency of unique values in the series. In this case, each unique value occurs with a frequency of 0.181818, except for the value 7, which occurs with a frequency of 0.090909.

## Counting Frequency in Equal-Sized Bins using `bins` Argument

The `value_counts()` function can also be used to count the frequency of values in equal-sized bins. The `bins` argument specifies the number of bins to divide the data into.

For instance, consider a pandas series with the following data:

``data = pd.Series([3, 4, 5, 2, 4, 2, 6, 7, 3, 5, 6, 8, 10, 12, 18, 25, 30])``

To count the frequency of values in three equal-sized bins, we can use the following code:

``````bins = [0, 10, 20, 30]
freq = pd.cut(data, bins=bins).value_counts()
print(freq)``````

### Output:

``````(0, 10]     11
(10, 20]     4
(20, 30]     2
dtype: int64``````

The output shows the frequency of values in three bins. The first bin (0 to 10) contains 11 values, the second bin (10 to 20) contains 4 values, and the third bin (20 to 30) contains 2 values.

## Counting Frequency of Values in Pandas DataFrames

Pandas dataframes are tabular data structures that contain multiple rows and columns. To count the frequency of values in a pandas dataframe, we need to specify the specific column we want to count.

For instance, consider the following dataframe:

``````data = pd.DataFrame({'name': ['John', 'Mary', 'Steve', 'John', 'Bob'],
'age': [32, 25, 19, 32, 40]})``````

To count the frequency of names in the dataframe, we can use the following code:

``````freq = data['name'].value_counts()
print(freq)``````

### Output:

``````John     2
Bob      1
Mary     1
Steve    1
Name: name, dtype: int64``````

The output shows the frequency of names in the ‘name’ column of the dataframe. In this case, John occurs twice, and the other names occur once.

Apart from the functions explained in this article, pandas offers many other common functions that can be useful for data analysis. You can find more information on these functions by referring to the pandas documentation or exploring pandas tutorials online.

Some of the commonly used functions include `groupby()`, `merge()`, `pivot_table()`, and `resample()`. These functions perform grouping and aggregation operations on data, merging data from multiple sources, reshaping and pivoting data, and resampling time series data, respectively.

## Conclusion

In this article, we explored five functions in pandas that are used to count the frequency of unique values, NaN values, relative frequency, frequency in equal-sized bins, and frequency of values in pandas dataframes. By learning how to use these functions, you can gain insights into the distribution of data in your pandas series or dataframes.

Pandas offers many other functions for data analysis, and you can explore them further to master the art of data manipulation in Python. In this article, we discussed how to count the frequency of unique values, NaN values, relative frequency, frequency in equal-sized bins, and frequency of values in pandas dataframes using the `value_counts()` function in Pandas.

We explored how to use various arguments such as `dropna`, `normalize`, and `bins` to get counts in specific conditions. We also emphasized the importance of mastering these functions to gain insights into the distribution of data in our data analysis.

By studying the functions in this article, readers can enhance their proficiency in data manipulation using Pandas and improve their data analysis skills in Python.