## Summing Columns and Rows in Pandas DataFrame

If you’re working with large amounts of data, you might need to perform various calculations to analyze it effectively. There are times when you’ll need to group or summarize this data to generate insights and understand the trends.

One of the most common operations is to sum the columns and rows in a Pandas DataFrame. The process can be incredibly simple and straightforward when you understand the syntax and working with Pandas DataFrame.

## 1) Syntax for Summing Each Column

A DataFrame is a 2D table that stores data in a tabular form. When you want to sum each column in a Pandas DataFrame, you can use the built-in sum() function to calculate the sum across the column axis.

In Pandas DataFrame, the syntax for summing each column is `df.sum(axis=0)`

, where ‘df’ refers to the DataFrame object, and ‘axis=0’ specifies that the sum operation should be performed along the column axis.

## 2) Syntax for Summing Each Row

Likewise, if you want to sum each row in a Pandas DataFrame, the syntax is `df.sum(axis=1)`

. Here, ‘axis=1’ specifies that the sum operation should be performed along the row axis.

## 3) Steps to Sum Each Column and Row in Pandas DataFrame

Now that you know the syntax for performing column and row sums, let’s explore the steps required to sum each column and row in a Pandas DataFrame in more detail.

### Data Preparation

To illustrate this process, let’s take an example of a commission data table, which contains the commission generated by sales reps in a company. The data table might contain columns such as Rep ID, Sales Figures, Commission Percentage, and so on.

### Creating a DataFrame

Once you have your data ready to go, the next step is to create a DataFrame object. You can import the Pandas library using the command `import pandas as pd`

, and then pass your data as an argument to the `pd.DataFrame()`

function to construct a new DataFrame object. For example:

```
import pandas as pd
data = {'Rep ID': [1, 2, 3, 4],
'Sales Figures': [10000, 15000, 20000, 18000],
'Commission Percentage': [0.05, 0.10, 0.07, 0.05]}
df = pd.DataFrame(data)
```

If you were to print the DataFrame `df`

, you would see the following output:

```
Rep ID Sales Figures Commission Percentage
0 1 10000 0.05
1 2 15000 0.10
2 3 20000 0.07
3 4 18000 0.05
```

### Summing Each Column and Row in Pandas DataFrame

When you have set up your DataFrame, it’s time to start calculating the sums. If we want to calculate the sum of each column, we can invoke the `df.sum(axis=0)`

function like this:

`column_sums = df.sum(axis=0)`

This command will return a new DataFrame object, which contains the sum of each column.

If we use the `print()`

function to show the `column_sums`

variable, it will provide the following output:

```
Rep ID 10.00
Sales Figures 73000.00
Commission Percentage 0.27
dtype: float64
```

You can see that we have the sum for each column, as well as the datatype. In this example, the Rep ID column was summed and returned as a float, which isn’t very meaningful.

However, the other columns are intuitive and can be reasoned about. Similarly, if you want to calculate the sum of each row, you can use the `df.sum(axis=1)`

function like this:

`row_sums = df.sum(axis=1)`

Once again, the `row_sums`

variable will provide a new DataFrame object containing the sum of each row.

When we print the `row_sums`

variable, we will get the following output:

```
0 1005.0
1 1650.0
2 1407.0
3 1110.0
dtype: float64
```

Here, we have a sum for each row, represented as a float value. You might notice that the index is a sequence of integers starting from zero, which is the default behavior for Pandas DataFrame.

You can change this index, but it’s beyond the scope of this article.

## 4) Count for each Column and Row in Pandas DataFrame

When working with large datasets, it can be helpful to identify the number of valid (not null) values in each column and row. In Pandas DataFrame, the `count()`

function can be used to do this.

### Syntax for Getting Count of Values for Each Column

To get the count of values for each column in a Pandas DataFrame, we first invoke the `df.count()`

function on the DataFrame itself with the `axis=0`

parameter, as shown below:

```
df_count = df.count(axis=0)
print(df_count)
```

This code block will display the number of non-null values for each column in the `df`

DataFrame:

```
Month 4
Product Type 4
Units Sold 4
Total Sale 4
Row Sum 4
dtype: int64
```

In this output, we can see that all the columns have four non-null values because that’s the number of rows in our DataFrame.

### Syntax for Getting Count of Values for Each Row

To get the count of values for each row, we first invoke the `df.count()`

function on the DataFrame with the `axis=1`

parameter, as shown in the following code:

```
df_count = df.count(axis=1)
print(df_count)
```

Running this code will display the number of non-null values for each row in the `df`

DataFrame:

```
0 4
1 4
2 4
3 4
dtype: int64
```

In this output, we can see that all the rows have four non-null values because that’s the number of columns in our DataFrame. Counting the null cells in Pandas DataFrame becomes essential when you need to identify missing information or the quality of your data set.

You can also use the `isnull()`

function in Pandas to identify null values in a DataFrame.

## 5) Summarizing Count for each Column and Row in Pandas DataFrame

One of the frequent needs in data analysis is to understand the completeness of a dataset. In other words, we need to know how many valid values we have in each column and row of a Pandas DataFrame.

Fortunately, Pandas provides the `count()`

function that allows us to count the non-null values.

### Creating a DataFrame

Let’s take an example of a test score table containing records of students’ marks. The columns of the table are student id, name, subject, and test score.

```
import pandas as pd
data = {'Student ID': [1, 2, 3, 4, 5],
'Name': ['John', 'Sara', 'Tom', 'Micheal', 'Sarah'],
'Subject': ['Math', 'Science', 'English', 'Math', 'Math'],
'Test Score': [85, 78, 89, 92, 88]}
df = pd.DataFrame(data)
print(df)
```

When you run this code, you’ll see the following DataFrame:

```
Student ID Name Subject Test Score
0 1 John Math 85
1 2 Sara Science 78
2 3 Tom English 89
3 4 Micheal Math 92
4 5 Sarah Math 88
```

### Getting Count of Values for Each Column

To obtain the count of values for each column of the table, we use the `count()`

function for the DataFrame object with `axis=0`

parameter.

```
column_count = df.count(axis=0)
print(column_count)
```

When you run this code, it will show the following output for column count:

```
Student ID 5
Name 5
Subject 5
Test Score 5
dtype: int64
```

Here, the count for each column is equal to the total number of records (5) in the DataFrame. This output shows that we don’t have any missing values in the DataFrame.

### Getting Count of Values for Each Row

To get the count of values for each row in the DataFrame, we use `count()`

function with `axis=1`

parameter.

```
row_count = df.count(axis=1)
print(row_count)
```

When you run this code, it will show the following output for row count:

```
0 4
1 4
2 4
3 4
4 4
dtype: int64
```

Here, the count for each row is equal to the total number of columns (4) in the DataFrame. This output shows that we don’t have any missing values in the DataFrame.

## 6) Other Operations on Pandas DataFrame

Apart from counting and summing rows and columns in a Pandas DataFrame, there are numerous other operations you can perform to better understand your dataset.

### Average Each Column

You can use the `mean()`

function in Pandas to calculate the average of each column in a Pandas DataFrame. Consider the previous example, where we had a table of test scores.

Let’s calculate the average test score for each subject.

```
average_score = df.groupby('Subject').mean()['Test Score']
print(average_score)
```

When you run this code, it will show the following output for average score:

```
Subject
English 89.0
Math 88.3
Science 78.0
Name: Test Score, dtype: float64
```

Here, the average test score for English is 89, for Math is 88.3, and for Science, it’s 78.

### Pandas Documentation

Pandas offer a rich set of functions and methods to process data efficiently, making it a popular tool for data scientists and developers. You can refer to the official Pandas documentation for a detailed list of functions and methods available.

The Pandas documentation provides examples, definitions, and usage of every function, allowing you to utilize the full potential of this powerful tool.

In conclusion, Pandas is an essential tool for data analysis and manipulation.

In this article, we discussed how to summarize counts for each column and row in Pandas DataFrame using the `count()`

function with `axis=0`

and `axis=1`

parameters, respectively. We also explored the `mean()`

function, which you can use to calculate the average of each column in a Pandas DataFrame.

Although we only scratched the surface of the operations you can do with Pandas, it is often sufficient to provide basic insights and analyses. In conclusion, this article has covered the essential operations that you need to perform on Pandas DataFrame while analyzing datasets.

We have learned how to sum each column and row of a DataFrame using the `sum()`

function with `axis=0`

and `axis=1`

parameters, respectively. We have also seen how to summarize counts for each column and row in Pandas DataFrame using the `count()`

function with `axis=0`

and `axis=1`

parameters, respectively.

Additionally, we explored how to calculate the average of each column in a Pandas DataFrame using the `mean()`

function. These operations provide a better understanding of data, which is crucial for making informed decisions in various fields like finance, marketing, healthcare, and more.

By utilizing these Pandas operations, you can improve data analysis efficiency and