Pandas is a Python Data Analysis Lirbary, dealing primarily with tabular data. It's forms a major Data Analysis Toolbox which is widely used in the domains like Data Mining, Data Warehousing, Machine Learning and General Data Science. It is an Open Source Library under a liberal BSD license. It has mainly 2 forms:
- Series: Contains data related to a single variable (can be visualized as a vector) along with indexing information.
- DataFrame: Contains tabular data.
Here are 20 Basic Pandas Exercises for beginners which must be the bread and butter for every budding Data Analyst/Data Scientist.
Pandas Installation in Python
In the command line (cmd) type the following command,
pip install pandas
20 Pandas Exercises for Beginners
Importing Pandas and printing version number
import pandas as pd print(pd.__version__)
Corresponding Output
1.1.3
EXERCISE 1 - List-to-Series Conversion
Given a list, output the corresponding pandas series
Sample Solution
given_list = [2, 4, 5, 6, 9] series = pd.Series(given_list) print(series)
Corresponding Output
0 2 1 4 2 5 3 6 4 9 dtype: int64
EXERCISE 2 - List-to-Series Conversion with Custom Indexing
Given a series, output the corresponding pandas series with odd indexes only
Sample Solution
given_list = [2, 4, 5, 6, 9] series = pd.Series(given_list, index = [1, 3, 5, 7, 9]) print(series)
Corresponding Output
1 2 3 4 5 5 7 6 9 9 dtype: int64
EXERCISE 3 - Date Series Generation
Generate the series of dates from 1st May, 2021 to 12th May, 2021 (both inclusive)
Sample Solution
date_series = pd.date_range(start = '05-01-2021', end = '05-12-2021') print(date_series)
Corresponding Output
DatetimeIndex(['2021-05-01', '2021-05-02', '2021-05-03', '2021-05-04', '2021-05-05', '2021-05-06', '2021-05-07', '2021-05-08', '2021-05-09', '2021-05-10', '2021-05-11', '2021-05-12'], dtype='datetime64[ns]', freq='D')
EXERCISE 4 - Implementing a function on each and every element of a series
Apply the function, f(x) = x/2 on each and every element of a given pandas series
Sample Solution
series = pd.Series([2, 4, 6, 8, 10]) print(series) # pandas series initially print() modified_series = series.apply(lambda x:x/2) print(modified_series) # pandas series after function application
Corresponding Output
0 2 1 4 2 6 3 8 4 10 dtype: int64 0 1.0 1 2.0 2 3.0 3 4.0 4 5.0 dtype: float64
EXERCISE 5 - Dictionary-to-Dataframe Conversion
Given a dictionary, convert it into corresponding dataframe and display it
Sample Solution
dictionary = {'name': ['Vinay', 'Kushal', 'Aman'], 'age' : [22, 25, 24], 'occ' : ['engineer', 'doctor', 'accountant']} dataframe = pd.DataFrame(dictionary) print(dataframe)
Corresponding Output
name age occ 0 Vinay 22 engineer 1 Kushal 25 doctor 2 Aman 24 accountant
EXERCISE 6 - 2D List-to-Dataframe Conversion
Given a 2D List, convert it into corresponding dataframe and display it
Sample Solution
lists = [[2, 'Vishal', 22], [1, 'Kushal', 25], [1, 'Aman', 24]] dataframe = pd.DataFrame(lists, columns = ['id', 'name', 'age']) print(dataframe)
Corresponding Output
id name age 0 2 Vishal 22 1 1 Kushal 25 2 1 Aman 24
EXERCISE 7 - Reading CSV to Dataframe
Given a CSV file, read it into a dataframe and display it
Sample Solution
dataframe = pd.read_csv('data.csv') print(dataframe)
Corresponding Output
id name age occ 0 1 Vinay 22 engineer 1 2 Kushal 25 doctor 2 3 Aman 24 accountant
EXERCISE 8 - Setting Custom Index in Dataframe
Given a dataframe, change the index of a dataframe from the default indexes to a particular column
Sample Solution
print(dataframe) # original dataframe before custom indexing print() dataframe_customindex = dataframe.set_index('id') # custom indexed dataframe with column, 'id' print(dataframe_customindex)
Corresponding Output
id name age occ 0 1 Vinay 22 engineer 1 2 Kushal 25 doctor 2 3 Aman 24 accountant name age occ id 1 Vinay 22 engineer 2 Kushal 25 doctor 3 Aman 24 accountant
EXERCISE 9 - Sorting a Dataframe by Index
Given a dataframe (say, with custom indexing), sort it by it's index
Sample Solution
print(dataframe) # original unsorted dataframe with custom indexing (id) print() dataframe_sorted = dataframe.sort_index() print(dataframe_sorted)
Corresponding Output
name age occ id 2 Vinay 22 engineer 3 Kushal 25 doctor 1 Aman 24 accountant name age occ id 1 Aman 24 accountant 2 Vinay 22 engineer 3 Kushal 25 doctor
EXERCISE 10 - Sorting a Dataframe by Multiple Columns
Given a dataframe, sort it by multiple columns
Sample Solution
print(dataframe) # original dataframe print() dataframe_sorted = dataframe.sort_values(by = ['id', 'age']) # dataframe after sorting by 'id' and 'age' print(dataframe_sorted)
Corresponding Output
id name age occ 0 2 Vinay 22 engineer 1 1 Kushal 25 doctor 2 1 Aman 24 accountant id name age occ 2 1 Aman 24 accountant 1 1 Kushal 25 doctor 0 2 Vinay 22 engineer
EXERCISE 11 - DataFrame with Custom Index to DataFrame with Dataframe with default indexes
Given a dataframe with custom indexing, convert and it to default indexing and display it
Sample Solution
print(dataframe_customindex) # printing the original dataframe with custom indexing print() dataframe = dataframe_customindex.reset_index() print(dataframe) # printing the dataframe with default indexes
Corresponding Output
name age occ id 1 Vinay 22 engineer 2 Kushal 25 doctor 3 Aman 24 accountant id name age occ 0 1 Vinay 22 engineer 1 2 Kushal 25 doctor 2 3 Aman 24 accountant
EXERCISE 12 - Indexing and Selecting Columns in a DataFrame
Given a dataframe, select a particular column and display it
Sample Solution
print(dataframe) # original dataframe print() o = dataframe['name'] # extracting the column 'name' print(o)
Alternative Solution 1
print(dataframe) # original dataframe print() o = dataframe.iloc[:,1] # extracting the column 'name' print(o)
Alternative Solution 2
print(dataframe) # original dataframe print() o = dataframe.loc[:,'name'] # extracting the column 'name' print(o)
Corresponding Output
id name age occ 0 2 Vinay 22 engineer 1 1 Kushal 25 doctor 2 1 Aman 24 accountant 0 Vinay 1 Kushal 2 Aman Name: name, dtype: object
EXERCISE 13 - Indexing and Selecting Rows in a DataFrame
Given a dataframe, select first 2 rows and output them
Sample Solution
print(dataframe) # original dataframe print() o = dataframe.iloc[[0,1], :] # extracting the 1st 2 rows of the dataframe print(o)
Alternative Solution
print(dataframe) # original dataframe print() o = dataframe.loc[[0,1], :] # extracting the 1st 2 rows of the dataframe print(o)
Corresponding Output
id name age occ 0 2 Vinay 22 engineer 1 1 Kushal 25 doctor 2 1 Aman 24 accountant id name age occ 0 2 Vinay 22 engineer 1 1 Kushal 25 doctor
EXERCISE 14 - Conditional Selection of Rows in a DataFrame
Given a dataframe, select rows based on a condition
Sample Solution
print(dataframe) # original dataframe print() # selecting people with age greater than or equal to 24 dataframe_condition = dataframe.loc[dataframe.age >= 24] print(dataframe_condition)
Corresponding Output
id name age occ 0 2 Vinay 22 engineer 1 1 Kushal 25 doctor 2 1 Aman 24 accountant id name age occ 1 1 Kushal 25 doctor 2 1 Aman 24 accountant
EXERCISE 15 - Applying Aggregate Functions
Given is a dataframe showing name, occupation, salary of people. Find the average salary per occupation
Sample Solution
print(dataframe) # original dataframe print() occ_average_age = dataframe.groupby('occ')['salary'].mean() # required dataframe print(occ_average_age)
Corresponding Output
name occ salary 0 Vinay engineer 60000 1 Kushal doctor 70000 2 Aman engineer 50000 3 Rahul doctor 60000 4 Ramesh doctor 65000 occ doctor 65000 engineer 55000 Name: salary, dtype: int64
EXERCISE 16 - Filling NaN Values in a DataFrame
Given a dataframe with NaN Values, fill the NaN values with 0
Sample Solution
print(dataframe) # original dataframe print() dataframe_nullfill = dataframe.fillna(0) print(dataframe_nullfill) # dataframe after filling NaN values with 1
Corresponding Output
name occ salary 0 Vinay engineer NaN 1 Kushal doctor 70000.0 2 Aman engineer NaN 3 Rahul doctor 60000.0 4 Ramesh doctor 65000.0 name occ salary 0 Vinay engineer 0.0 1 Kushal doctor 70000.0 2 Aman engineer 0.0 3 Rahul doctor 60000.0 4 Ramesh doctor 65000.0
EXERCISE 17 - Applying Functions (UDFs) on DataFrame
Given is a dataframe showing Company Names (cname) and corresponding Profits (profit). Convert the values of Profit column such that values in it greater than 0 are set to True and the rest are set to False.
Sample Solution
print(company_data) # original dataframe print() company_data['profit'] = company_data['profit'].apply(lambda x:x>0) print(company_data) # required dataframe
Corresponding Output
cname profit 0 Shyam & Co. -10000 1 Ramlal & Bros. 10000 2 Sharma Enterprises -5000 3 Verma Furnitures 15000 4 Rahul Stores 20000 cname profit 0 Shyam & Co. False 1 Ramlal & Bros. True 2 Sharma Enterprises False 3 Verma Furnitures True 4 Rahul Stores True
EXERCISE 18 - Joining 2 DataFrames by a Common Column (key)
Given are 2 dataframes, with one dataframe containing Employee ID (eid), Employee Name (ename) and Stipend (stipend) and the other dataframe containing Employee ID (eid) and designation of the employee (designation). Output the Dataframe containing Employee ID (eid), Employee Name (ename), Stipend (stipend) and Position (position).
Sample Solution
print(emp_data) # 1st DataFrame containing employee id (eid), employee name (ename) and stipend print() print(company_data) # 2nd DataFrame containing employee id (eid) and designation of the employee (position) print() dataframe = pd.merge(emp_data, company_data, how = 'inner', on = 'eid') # required dataframe print(dataframe)
Corresponding Output
eid ename stipend 0 1 Sid 10000 1 2 Ramesh 10000 2 3 Ron 5000 3 4 Harry 15000 eid position 0 1 employee 1 2 employee 2 3 intern 3 4 senior_employee eid ename stipend position 0 1 Sid 10000 employee 1 2 Ramesh 10000 employee 2 3 Ron 5000 intern 3 4 Harry 15000 senior_employee
EXERCISE 19 - Getting the Non-Null Count and Data Type for Every Column
Given a dataframe, output the non-null count and data-type for every column
Sample Solution
print(dataframe) # the dataframe print() print(dataframe.info())
Corresponding Output
<class 'pandas.core.frame.DataFrame'> Int64Index: 4 entries, 0 to 3 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 eid 4 non-null int64 1 ename 4 non-null object 2 stipend 4 non-null int64 3 position 4 non-null object dtypes: int64(2), object(2) memory usage: 160.0+ bytes None
EXERCISE 20 - Getting the Statistical Summary of all the Numerical Features of a DataFrame
Given a dataframe, generate the statistical summary of all the numerical features present in it
Sample Solution
print(dataframe) # the dataframe print() print(dataframe.describe())
Corresponding Output
eid ename stipend position 0 1 Sid 10000 employee 1 2 Ramesh 10000 employee 2 3 Ron 5000 intern 3 4 Harry 15000 senior_employee eid stipend count 4.000000 4.000000 mean 2.500000 10000.000000 std 1.290994 4082.482905 min 1.000000 5000.000000 25% 1.750000 8750.000000 50% 2.500000 10000.000000 75% 3.250000 11250.000000 max 4.000000 15000.000000
Conclusion
The above are the building blocks of Pandas that every beginner (Data Analyst or Scientist) must have an edge on. In case you are stuck somewhere in any of the pandas exercises or need further clarification on a concept of data science or Python, FavTutor experts are always available 24/7 to provide you help.