Python provides a variety of powerful data structures that can be used for data analysis and manipulation. While dictionaries are useful for storing key-value pairs, dataframes are more suitable for handling large and complex datasets. In this article, we will explore how to convert a dictionary into pandas dataframe in python.
What is a Dataframe in python?
A dataframe is a 2-Dimensional data structure used in python. It is more of a 2D spreadsheet than a table with rows and columns. Because they are a flexible and user-friendly method of storing and interacting with data, they are utilized for storing data and are primarily frequently used in modern data analytics. The Data Scientist works with datasets in the form of dataframes every day.
The pandas library in the Python ecosystem is the most popular library for dataframe operations. It is used for data analysis in a quick and efficient manner by offering a clear and potent API that helps developers deal with data.
How to convert a Dictionary to a Dataframe?
There are many ways available in python to convert a python dictionary into pandas datarame.
First, import the pandas library. Then, use the built-in function pd.DataFrame.from_dict() to convert any python dictionary into dataframe. It is an alternative method to the pd.DataFrame() because it takes some additional arguments like orient to specify the orientation of the resulting dataframe and can handle dictionaries with different lengths for its values.
Syntax: pd.DataFrame.from_dict(data, orient='columns', dtype=None, columns=None)
Parameters :
- data = Dictionary you want to convert to data frame
- Orient = {‘columns’, ‘index’, ‘tight’}, default ‘columns’. It is the orientation of your data. How the Dict(key: values) are to be converted in a data frame that’s key should be used as a column or row.
- Dtype = Data Type you want to force, otherwise remain as original.
- Columns = List of column names that you want to pass when using keys as rows or orient=’index’.Raises a ValueError if used with orient='columns' or orient='tight'.
Example 1: Using Default parameters
import pandas as pd data = {'Name': ['John', 'Sunny', 'Koel', 'Veer'], 'Age': [30, 28, 19, 27], 'Courses': ['Python', 'Java', 'C++', 'C']} df = pd.DataFrame.from_dict(data) print(df)
Output:
Name Age Courses 0 John 30 Python 1 Sunny 28 Java 2 Koel 19 C++ 3 Veer 27 C
Example 2: Using ‘tight’ as an orientation for parameter orient.
data = {'index': [('A', 'B'), ('A', 'C')], 'columns': [('x', 1), ('y', 2)], 'data': [[1, 3], [2, 4]], 'index_names': ['n1', 'n2'], 'column_names': ['z1', 'z2']} pd.DataFrame.from_dict(data, orient='tight')
Output:
z1 x y z2 1 2 n1 n2 A B 1 3 C 2 4
The input data is in the form of a dictionary, with the following keys and values:
- Index: a list of tuples representing the index of the DataFrame
- Columns: a list of tuples representing the columns of the DataFrame
- Data: a list of lists representing the data values in the DataFrame
- Index_names: a list of names for the DataFrame index
- Column_names: a list of names for the DataFrame columns
The orient argument is set to ‘tight’, which specifies the format of the input dictionary. This means that the keys and values of the dictionary correspond directly to the columns, index, and data of the DataFrame.
The resulting DataFrame will have the index values [('A’, 'B'), ('A', 'C')] and the column values [('x', 1), ('y', 2)]. The data values will be [[1, 3], [2, 4]]. The index and column names will be ['n1', 'n2'] and ['z1', 'z2'], respectively.
Example 3: Converting dictionary with keys and list of values with different lengths.
data = { '2022-01-01': [5, 7, 5], '2022-01-02': [7, 5, 6], '2022-01-03': [7, 4, 4], '2022-01-04': [7, 2, 10] } pd.DataFrame( [(k, val) for k, vals in data.items() for val in vals], columns=['Date', 'Interviewed'] )
Output:
Date Interviewed 0 2022-01-01 5 1 2022-01-01 7 2 2022-01-01 5 3 2022-01-02 7 4 2022-01-02 5 5 2022-01-02 6 6 2022-01-03 7 7 2022-01-03 4 8 2022-01-03 4 9 2022-01-04 7 10 2022-01-04 2 11 2022-01-04 10
Note: It constructs a DataFrame containing NaN for non-existing keys when a key is not discovered for certain dictionaries but found in other dictionaries.
Another approach is using the pd.DataFrame.from_records() function instead of dict(). To create a data frame from a list of dictionaries, use the pd.DataFrame.from_records() function. Additionally, it may be used to transform structured or recorded ndarray into a DataFrame, a sequence of tuples or dicts, or from another DataFrame.
Example 4:
Subjects=[{'Courses':'Java','Duration':'60days','Fee':'$95','Discount':'$15'} , {'Courses':'python','Fee':'$95','Duration':'60days'}, {'Courses':'Data Science','Fee':'$95','Duration_Discount':'10days'}] df = pd.DataFrame.from_records(Subjects, index=['1', '2', '3']) print(df)
Output:
Courses Duration Fee Discount Duration_Discount 1 Java 60days $95 $15 NaN 2 python 60days $95 NaN NaN 3 Data Science NaN $95 NaN 10days
Example 5: Using pandas constructor (pd.DataFrame()) to directly convert a given dictionary into a dataframe.
import pandas as pd data = {'Name': ['John', 'Sunny', 'Koel', 'Veer'], 'Age': [30, 28, 19, 27], 'Courses': ['Python', 'Java', 'C++', 'C']} df = pd.DataFrame(data) print(df)
Output:
Name Age Courses 0 John 30 Python 1 Sunny 28 Java 2 Koel 19 C++ 3 Veer 27 C
Also, check out how to Dictionary to JSON in python with code.
Conclusion
Dataframes are the primary data type used by pandas, the well-known Python data analysis toolkit. These are also used by R and other programming languages. Here we learned different ways to turn different dictionaries into a dataframe, with examples.
Happy learning and I'll see you soon with my next blog:)