Adding Columns to a Pandas DataFrame
If you are a data analyst, scientist, or simply someone who works with data regularly, you are probably familiar with the Pandas library in Python. Pandas is a powerful tool used for data manipulation, analysis, and cleaning.
Its central structure is the Pandas DataFrame, a table-like data structure that allows you to organize and manipulate data efficiently. In this article, we will explore how to add columns to a Pandas DataFrame.
1) Adding Columns with One Value
One common task you may encounter when working with Pandas is adding a new column to the existing DataFrame. Adding a new column with a single value is pretty straightforward; we can use the assignment operator “=” to create new columns.
Let’s start with a simple example:
import pandas as pd
df = pd.DataFrame({
'Name': ['John', 'Jane', 'Mark', 'Kelly'],
'Age': [22, 34, 45, 28]
})
df['Gender'] = 'Male'
print(df)
In this example, we create a new DataFrame using a dictionary to define two columns `Name` and `Age`. We add a new column `Gender` by setting it to the string value ‘Male’.
After executing the code and printing the DataFrame, we see the following output:
Name Age Gender
0 John 22 Male
1 Jane 34 Male
2 Mark 45 Male
3 Kelly 28 Male
We can see that a new column `Gender` with the value ‘Male’ has been added to the DataFrame.
2) Adding Columns with Multiple Values
Adding a new column with multiple values is similar to adding a column with one value, but instead of using a single value, we use a list or an array.
Let’s use the same DataFrame as before:
import pandas as pd
df = pd.DataFrame({
'Name': ['John', 'Jane', 'Mark', 'Kelly'],
'Age': [22, 34, 45, 28]
})
df['Gender'] = ['Male', 'Female', 'Male', 'Female']
print(df)
In this example, we add a new column `Gender` with an array of strings that specifies the gender of each person in the DataFrame. When we run the code and print the DataFrame, we see the following output:
Name Age Gender
0 John 22 Male
1 Jane 34 Female
2 Mark 45 Male
3 Kelly 28 Female
We can see that a new column `Gender` with the corresponding gender values has been added to the DataFrame.
Example DataFrame and Its Structure
To better understand how to add columns to a Pandas DataFrame, let’s create an example DataFrame and examine its structure. We will start by creating a DataFrame with information about the employees of a company:
import pandas as pd
data = {
'EmployeeID': ['001', '002', '003', '004', '005'],
'Name': ['John', 'Jane', 'Mark', 'Kelly', 'Bob'],
'Salary': [50000, 60000, 75000, 40000, 90000],
'Department': ['Sales', 'Marketing', 'HR', 'IT', 'Finance']
}
df = pd.DataFrame(data)
print(df)
When we run the code and print the DataFrame, we see the following output:
EmployeeID Name Salary Department
0 001 John 50000 Sales
1 002 Jane 60000 Marketing
2 003 Mark 75000 HR
3 004 Kelly 40000 IT
4 005 Bob 90000 Finance
We can see that the DataFrame consists of four columns: `EmployeeID`, `Name`, `Salary`, and `Department`. Each column has a unique name that we can use to reference it.
Moreover, we can see that each column contains data that is related to a specific aspect of the employee’s profile.
Displaying the DataFrame
To display the DataFrame, we use the `print()` function and pass the DataFrame as an argument. This will output the entire DataFrame to the console.
However, sometimes we may want to display only a portion of the DataFrame, such as the first few rows, to get an idea of what the data looks like. We can use the `.head()` function to achieve this:
print(df.head())
This will display the first five rows of the DataFrame:
EmployeeID Name Salary Department
0 001 John 50000 Sales
1 002 Jane 60000 Marketing
2 003 Mark 75000 HR
3 004 Kelly 40000 IT
4 005 Bob 90000 Finance
If we want to display more or fewer rows, we just need to pass the desired number as an argument to `head()`. For example, to display the first three rows, we can use:
print(df.head(3))
This will output:
EmployeeID Name Salary Department
0 001 John 50000 Sales
1 002 Jane 60000 Marketing
2 003 Mark 75000 HR
3) Adding Multiple Columns with One Value to a Pandas DataFrame
Adding multiple columns with a single value to a Pandas DataFrame involves understanding the syntax of the DataFrame and the type of value we want to allocate to our new columns. Let’s consider the following example:
import pandas as pd
data = {
'Name': ['Sarah', 'John', 'Jake', 'Tasha'],
'Age': [22, 34, 45, 28]
}
df = pd.DataFrame(data)
df['Courses'] = 'Mathematics'
df['Grade'] = 80
df['Level'] = 'Intermediate'
print(df)
Here we have created a Pandas DataFrame using a dictionary and created three new columns – `Courses`, `Grade`, and `Level` – with one value each. These new columns are added to the already existing `Name` and `Age` columns of our DataFrame.
After executing the code and printing our DataFrame, we see the following update:
Name Age Courses Grade Level
0 Sarah 22 Mathematics 80 Intermediate
1 John 34 Mathematics 80 Intermediate
2 Jake 45 Mathematics 80 Intermediate
3 Tasha 28 Mathematics 80 Intermediate
We can see that new columns with the same value have been added to our DataFrame. Adding multiple columns with one value is particularly useful when you have to make the same changes to all the columns.
For example, in an attendance table, the status of all students on a particular day might be the same as “present” even though explicitly recording it for each student would be unnecessarily repetitive. If we want to change the value of our new columns from ‘Mathematics’, ’80’, and ‘Intermediate’ to something else, we just modify the corresponding variable.
Displaying the Updated DataFrame
After adding multiple new columns with a single value, it is essential to display the updated DataFrame to ensure that the changes have been applied correctly. We can do this using the `print()` function.
We can either use `print(df)` to display the entire DataFrame or `print(df.head())` to display only the first five rows of the updated DataFrame.
import pandas as pd
data = {
'Name': ['Sarah', 'John', 'Jake', 'Tasha'],
'Age': [22, 34, 45, 28]
}
df = pd.DataFrame(data)
df['Courses'] = 'Mathematics'
df['Grade'] = 80
df['Level'] = 'Intermediate'
print(df.head())
This will display the first five rows of the updated DataFrame:
Name Age Courses Grade Level
0 Sarah 22 Mathematics 80 Intermediate
1 John 34 Mathematics 80 Intermediate
2 Jake 45 Mathematics 80 Intermediate
3 Tasha 28 Mathematics 80 Intermediate
4) Adding Multiple Columns with Multiple Values to a Pandas DataFrame
Adding multiple columns with multiple values to a Pandas DataFrame is also a straightforward task. Let’s consider the following example:
import pandas as pd
data = {
'Name': ['Sarah', 'John', 'Jake', 'Tasha'],
'Age': [22, 34, 45, 28]
}
df = pd.DataFrame(data)
df['Courses'] = ['Mathematics', 'Science', 'Literature', 'Social Sciences']
df['Grades'] = [80, 75, 85, 90]
df['Level'] = ['Intermediate', 'Advanced', 'Expert', 'Intermediate']
print(df)
Here we create a new DataFrame with the columns `Name` and `Age`. We then add three new columns – `Courses`, `Grades`, and `Level` – with multiple values for each column.
When we execute the code and print our DataFrame, we see the following output:
Name Age Courses Grades Level
0 Sarah 22 Mathematics 80 Intermediate
1 John 34 Science 75 Advanced
2 Jake 45 Literature 85 Expert
3 Tasha 28 Social Sciences 90 Intermediate
We can see that our new columns with multiple values have been added to our DataFrame. It’s important to understand that when adding multiple columns with varied data types such as strings, integers, and floats, it’s crucial to ensure that the values are entered in the right order.
Displaying the Updated DataFrame
After adding multiple new columns with multiple values, we must display the updated DataFrame to verify that our changes have been applied correctly. We can use the `print()` function, just like we did when adding multiple columns with one value:
import pandas as pd
data = {
'Name': ['Sarah', 'John', 'Jake', 'Tasha'],
'Age': [22, 34, 45, 28]
}
df = pd.DataFrame(data)
df['Courses'] = ['Mathematics', 'Science', 'Literature', 'Social Sciences']
df['Grades'] = [80, 75, 85, 90]
df['Level'] = ['Intermediate', 'Advanced', 'Expert', 'Intermediate']
print(df.head())
This will display the first five rows of the updated DataFrame:
Name Age Courses Grades Level
0 Sarah 22 Mathematics 80 Intermediate
1 John 34 Science 75 Advanced
2 Jake 45 Literature 85 Expert
3 Tasha 28 Social Sciences 90 Intermediate
Conclusion
In this article, we looked at how to add multiple columns with one value and multiple values to a Pandas DataFrame. We demonstrated how to use the `=` operator to create new columns with a single value and how to add new columns with multiple values using lists or arrays.
We also examined how to display an updated DataFrame using the `print()` function. With these skills, you can effortlessly work with Pandas DataFrames and manipulate data to suit your needs.