Pandas Groupby Count Using Size() and Count() Method

Feb 10, 2022
7 Minutes Read

Why Trust Us
We uphold a strict editorial policy that emphasizes factual accuracy, relevance, and impartiality. Our content is crafted by top technical writers with deep knowledge in the fields of computer science and data science, ensuring each piece is meticulously reviewed by a team of seasoned editors to guarantee compliance with the highest standards in educational content creation and publishing.
By Shivali Bhadaniya

Pandas Groupby Count Using Size() and Count() Method

Pandas is one of the most popular libraries in Python. Pandas provide data structures, a large collection of inbuilt methods, and operations for data analysis. It’s made mainly for working with relational or labeled data both easily and intuitively. There are many in-build methods supported by the pandas library which enables you to quickly perform operations on a large dataset. In this article, we will study how you can efficiently count the number of rows in pandas groupby using some in-build pandas library function along with example and output. So, let's get started!

What is Groupby in Pandas?

When dealing with data science projects, you’ll often experiment with a large amount of data and keep trying the operations on datasets over and over. This is where the concept of groupby comes into the picture. You can define groupby as the ability to aggregate the given data efficiently by improving the performance and efficiency of your code. Groupby concept mainly refers to:

Splitting the dataset in form of the group by applying some operations
Applying the given function to each group independently
Combining the different results of each dataset using the groupby() method and result into a data structure.

As pandas groupby refers to individual groups of a given dataset, what if you wish to count the number of rows present in each of these groups? Counting them manually is quite an infeasible and impossible task, and therefore, let us study some of the efficient methods which can help you with this task.

How to Count Rows in Each Group of Pandas Groupby?

Below are two methods by which you can count the number of objects in groupby pandas:

1) Using pandas groupby size() method

The most simple method for pandas groupby count is by using the in-built pandas method named size(). It returns a pandas series that possess the total number of row count for each group. The basic working of the size() method is the same as len() method and hence, it is not affected by NaN values in the dataset.

For better understanding, let us go through an example below:

Consider the dataframe consisting of the bunch of students' names with respect to the subjects they study.

import pandas as pd

data = {
  "Students": ["Ray", "John", "Mole", "Smith", "Jay", "Milli", "Tom", "Rick"],
  "Subjects": ["Maths", "Economics", "Science", "Maths", "Statistics", "Statistics", "Statistics", "Computers"]
}

#load data into a DataFrame object:
df = pd.DataFrame(data)

print(df)

Output:

     Students    Subjects
0      Ray       Maths
1     John   Economics
2     Mole     Science
3    Smith       Maths
4      Jay  Statistics
5    Milli  Statistics
6      Tom  Statistics
7     Rick   Computers

Now, let us group the above dataframe with the column “Subjects” and identify the number of rows in each group using the groupby size() method.

For example:

import pandas as pd

data = {
  "Students": ["Ray", "John", "Mole", "Smith", "Jay", "Milli", "Tom", "Rick"],
  "Subjects": ["Maths", "Economics", "Science", "Maths", "Statistics", "Statistics", "Statistics", "Computers"]
}

#load data into a DataFrame object:
df = pd.DataFrame(data)
print(df.groupby('Subjects').size())

Output:

Subjects
Computers     1
Economics     1
Maths         2
Science       1
Statistics    3
dtype: int64

As a result, the output for the above example displays the count of rows for each group in the dataframe with respective to the subjects available.

2) Using pandas grouby count() method

Instead of the size() method, you can also use the pandas groupby count() method to count the values of each column in each group. Note that the number of counts is always similar to the row sizes if there is no presence of NaN value in the dataframe. Check out the below example for a better understanding of the pandas grouby count() method:

For example:

import pandas as pd

data = {
  "Students": ["Ray", "John", "Mole", "Smith", "Jay", "Milli", "Tom", "Rick"],
  "Subjects": ["Maths", "Economics", "Science", "Maths", "Statistics", "Statistics", "Statistics", "Computers"]
}

#load data into a DataFrame object:
df = pd.DataFrame(data)
print(df.groupby('Subjects').count())

Output:

               Students
Subjects            
Computers          1
Economics          1
Maths              2
Science            1
Statistics         3

Apart from this, you can also use the value_count() method if you are grouping the dataframe using a single column.

For example:

import pandas as pd

data = {
  "Students": ["Ray", "John", "Mole", "Smith", "Jay", "Milli", "Tom", "Rick"],
  "Subjects": ["Maths", "Economics", "Science", "Maths", "Statistics", "Statistics", "Statistics", "Computers"]
}

#load data into a DataFrame object:
df = pd.DataFrame(data)
print(df['Subjects'].value_counts())

Output:

Statistics    3
Maths         2
Economics     1
Science       1
Computers     1
Name: Subjects, dtype: int64

Difference between Size() and Count() Methods

Looking at the above examples, you must have made up your mind to interchangeably use the size() and count() method while working with pandas groupby. However, note that both of these methods are quite distinct on their own. The count() function returns the number of values in each group, which may or may not be equal to the number of rows because any NaN values encountered by the count() method will be ignored in this case. However, on the other hand, the size() method will get the actual number of rows for each group of dataframe irrespective of NaN values. Let’s understand this using an example:

For example:

import numpy as np
# create a dataframe
data = {
  "Students": ["Ray", "John", "Mole", "John", "John", "John", "Ray", "Rick"],
  "Subjects": ["Maths", "Economics", "Science", "Maths", np.nan, "Statistics", "Statistics", "Computers"]
}
df = pd.DataFrame(data)
# display the dataframe
print(df.groupby('Students').size())

Output:

Students
John    4
Mole    1
Ray     2
Rick    1
dtype: int64

Now using the count() method on the “Students” column of dataframe

For example:

import numpy as np
# create a dataframe
data = {
  "Students": ["Ray", "John", "Mole", "John", "John", "John", "Ray", "Rick"],
  "Subjects": ["Maths", "Economics", "Science", "Maths", np.nan, "Statistics", "Statistics", "Computers"]
}
df = pd.DataFrame(data)
# display the dataframe
print(df.groupby('Students').count())

Output:

             Subjects
Students          
John             3
Mole             1
Ray              2
Rick             1

Looking at the above example, you must have understood that if you wish to count the total number of rows in each dataframe, make use of the size() method on groupby, and if you wish to count only the non-null values, get your task done with pandas groupby count() method.

Conclusion

Python Pandas is an open-source library that provides the ability of high data manipulation and data analysis tools. However, to utilize this ability of pandas efficiently, you must be familiar with a huge collection of pandas in-built libraries which enables you to perform certain operations on large datasets. In this article, we studied how you can count the number of rows in each group of pandas groupby using some in-built functions and make your programming easy and efficient while working with massive data. If you want to practice more about pandas, try these exercises for beginners.

The Top 10 favtutor Features You Might Have Overlooked

Pandas Groupby Count Using Size() and Count() Method

What is Groupby in Pandas?

How to Count Rows in Each Group of Pandas Groupby?

1) Using pandas groupby size() method

2) Using pandas grouby count() method

Difference between Size() and Count() Methods

Conclusion

FavTutor - 24x7 Live Coding Help from Expert Tutors!

About The Author

Shivali Bhadaniya

More by FavTutor Blogs

The Top 10 favtutor Features You Might Have Overlooked

Pandas Groupby Count Using Size() and Count() Method

What is Groupby in Pandas?

How to Count Rows in Each Group of Pandas Groupby?

1) Using pandas groupby size() method

2) Using pandas grouby count() method

Difference between Size() and Count() Methods

Conclusion

FavTutor - 24x7 Live Coding Help from Expert Tutors!

About The Author

Shivali Bhadaniya

More by FavTutor Blogs

Reverse Level Order Traversal in Binary Tree (with code)

Vedanti Kshirsagar

Types of Inheritance in Python (with Examples)

Kusum Jain

What does xrange() do in Python?

Kusum Jain