Introduction to Python Pandas Module
Python is one of the most popular programming languages used for various purposes. Analyzing data is one of the most widely used applications of Python.
One of the most influential Python libraries used for data analysis is Pandas. Pandas is a high-level data manipulation tool that provides data structures and functions to manipulate and analyze data efficiently.
Creation of a DataFrame in Pandas Module
A DataFrame is a two-dimensional table in Pandas which consists of rows and columns. It is a fundamental tool for analyzing and manipulating data in Pandas.
To create a DataFrame, one needs to import the Pandas library and create an empty DataFrame using the Pandas DataFrame() method. Here is an example:
1. Creating an Empty DataFrame
import pandas as pd
df = pd.DataFrame()
After creating an empty DataFrame, one can add data to it by specifying the rows and columns. Example:
2. Creating a DataFrame with Data
import pandas as pd
students_data = {"Student": ["John","Sarah","Matt","Lena","Kim"],
"Math": [90,75,80,85,98],
"Physics":[85,80,90,82,95],
"Chemistry":[95,87,80,90,91]}
df = pd.DataFrame(students_data)
In the above example, we have created a DataFrame called students_data that includes information about five students. The DataFrame consists of four columns: Student, Math, Physics, and Chemistry.
The rows represent each student.
Manipulating DataFrame
Manipulating a DataFrame is a crucial part of data analysis. In Pandas, one can manipulate data using various functions such as loc[], iloc[], and drop().
1. Using the loc[] Function
The loc[] function helps to select data using labels. Here’s an example:
import pandas as pd
students_data = {"Student": ["John","Sarah","Matt","Lena","Kim"],
"Math": [90,75,80,85,98],
"Physics":[85,80,90,82,95],
"Chemistry":[95,87,80,90,91]}
df = pd.DataFrame(students_data)
df.set_index('Student',inplace=True)
math_marks = df.loc[['Matt','John'],'Math']
In the above example, using loc[], we have extracted the math marks of two students, Matt and John. The iloc[] function is similar to loc[], but instead of using labels, it uses the integer-based index.
2. Using the iloc[] Function
Here’s an example:
import pandas as pd
data = {"Country": ["USA","India","China","Russia"],
"Population(2019)": [328,1371,1403,144.5],
"GDP": [21.44,2.7,14.14,1.64]}
df = pd.DataFrame(data)
df = df.iloc[1:3]
In the above example, using iloc[], we have extracted the data for rows with integer-based index values of 1 and 2. The drop() function helps to remove a row or column from a DataFrame.
3. Using the drop() Function
Here’s an example:
import pandas as pd
students_data = {"Student": ["John","Sarah","Matt","Lena","Kim"],
"Math": [90,75,80,85,98],
"Physics":[85,80,90,82,95],
"Chemistry":[95,87,80,90,91]}
df = pd.DataFrame(students_data)
df.drop([2,4],inplace=True)
In the above example, using the drop() function, we have removed rows with index values 2 and 4 from the DataFrame.
Saving a DataFrame as a CSV file
Once you have analyzed the data, it is essential to share your analysis with others. One of the ways to share the analysis is by sharing the data frame itself.
However, sharing the DataFrame alone is not enough. One needs to share it in a standardized format that can be used on multiple platforms.
The Comma-Separated Values (CSV) format is the most widely used format for this purpose. To save a DataFrame as a CSV file, one needs to use the to_csv() function provided by Pandas.
Here’s an example:
import pandas as pd
data = {"Country": ["USA","India","China","Russia"],
"Population(2019)": [328,1371,1403,144.5],
"GDP": [21.44,2.7,14.14,1.64]}
df = pd.DataFrame(data)
df.to_csv('file_name.csv')
In the above example, the DataFrame has been saved as a CSV file named as file_name.csv.
Conclusion
Pandas is an essential tool for analyzing and manipulating data in Python. It provides various functions that help to analyze data more efficiently.
Once the analysis is complete, the data can be saved as a CSV file for sharing purposes. These tools make Pandas an invaluable resource for data analysts and programmers, and they should always be kept in mind when working with data using Python.
In summary, Pandas is a crucial module for analyzing and manipulating data in Python. It provides the necessary data structures and functions to make analyzing data efficient.
Once data analysis is complete, it’s essential to share it with others in a standardized format.
Saving a DataFrame as a CSV file is a simple and effective way to do so.
The Pandas module is an invaluable resource for data analysts and programmers alike to handle massive and complex data, and it’s essential to keep these tools in mind when working with data in Python.