Creating Frequency Table in Python
Python is one of the most popular programming languages used by data analysts and data scientists. It offers a wide array of libraries and functions that are useful in data manipulation, exploration, and analysis.
One of these libraries is the Pandas library, which provides powerful tools for data analysis. One of the crucial steps in data exploration is to create a frequency table.
A frequency table is a table that shows the number of times that each unique value in a dataset occurs. In this article, we will look at how to create frequency tables in Python using the value_counts()
and crosstab()
functions.
Using value_counts()
Function
The value_counts()
function in Pandas is a straightforward and easy way to create a frequency table. It takes a Pandas Series object as input and returns a Pandas Series object that contains the counts of unique values in descending order.
Let’s consider an example where we have a dataset that contains the grades of students in a school:
import pandas as pd
# create a dataframe containing student grades
df = pd.DataFrame({
'name': ['Alice', 'Bob', 'Charlie', 'Dave', 'Eve', 'Frank'],
'grade': [70, 80, 90, 70, 80, 90]
})
# create frequency table
freq_table = df['grade'].value_counts()
print(freq_table)
Output:
90 2
80 2
70 2
Name: grade, dtype: int64
As you can see from the output, the value_counts()
function has created a frequency table for us, showing the number of occurrences of each unique value in the grade column.
Using crosstab()
Function
The crosstab()
function in Pandas is another useful tool for creating frequency tables. The crosstab()
function takes two or more Pandas Series objects as input and returns a Pandas DataFrame object that contains the frequency of the combinations of values in the input Series objects.
Let’s consider an example where we have a dataset that contains the age and grade of students, and we want to create a frequency table of the two variables:
import pandas as pd
# create a dataframe containing student ages and grades
df = pd.DataFrame({
'age': [18, 19, 20, 18, 19, 20],
'grade': [70, 80, 90, 70, 80, 90]
})
# create frequency table
freq_table = pd.crosstab(df['age'], df['grade'])
print(freq_table)
Output:
grade 70 80 90
age
18 2 0 0
19 0 2 0
20 0 0 2
The output shows a two-way frequency table with the count of the number of students in each age group and grade group.
Advance Frequency Tables (2-way Tables)
In data analysis, we often need to create two-way frequency tables to examine the relationship between two variables. A two-way frequency table is a table that shows the frequency of the joint distribution of two categorical variables.
In Python, we can create a two-way frequency table using the crosstab()
function.
Creating a Two-way Frequency Table for Two Columns
Let us consider an example of creating a two-way frequency table for two columns using the crosstab()
function. Suppose we have a dataset containing the grades of students in a school and the corresponding grade level.
import pandas as pd
# create a dataframe containing student grades and grade level
df = pd.DataFrame({
'grade': [70, 80, 90, 70, 80, 90],
'grade_level': ['B', 'A', 'A', 'B', 'A', 'A']
})
# create a two-way frequency table
freq_table = pd.crosstab(df['grade'], df['grade_level'])
print(freq_table)
Output:
grade_level A B
grade
70 0 2
80 2 0
90 2 0
The output shows a two-way frequency table with the count of the number of students in each grade and grade level.
Creating a Two-way Frequency Table between Two Columns
Now let us consider an example of creating a two-way frequency table between two columns using the crosstab()
function. Suppose we have a dataset containing the gender of students in a school and their corresponding grade level.
import pandas as pd
# create a dataframe containing student gender and grade level
df = pd.DataFrame({
'gender': ['M', 'M', 'F', 'F', 'M', 'F'],
'grade_level': ['B', 'A', 'A', 'B', 'A', 'A']
})
# create a two-way frequency table
freq_table = pd.crosstab(df['gender'], df['grade_level'])
print(freq_table)
Output:
grade_level A B
gender
F 2 1
M 2 1
The output shows a two-way frequency table with the count of the number of students in each gender and grade level.
Conclusion
Creating frequency tables is an essential step in data exploration for understanding the distribution of values in a dataset. Pandas offers several tools for creating frequency tables, such as value_counts()
and crosstab()
functions.
The crosstab()
function can create two-way frequency tables to examine the relationship between two categorical variables. By following the above examples, you should now have a good understanding of how to create frequency tables in Python.
In conclusion, creating frequency tables in Python is a crucial step in data exploration for understanding the distribution of values in a dataset. The Pandas library provides tools such as the value_counts()
and crosstab()
functions to create frequency tables with ease.
Two-way frequency tables are also useful in examining the relationship between two categorical variables. By following the examples in this article, you should now have a good understanding of how to create frequency tables in Python and be able to apply these techniques to future data analysis tasks.
Always remember that frequency tables are necessary for comprehensive data analysis, and mastering these tools ensures that you can extract valuable insights from datasets.