In Python, a list is a versatile collection of objects that allows for duplicates. But sometimes it is necessary to make the list unique to streamline our data or perform certain operations. Here, we are going to study the multiple ways to remove duplicates from the list in Python. So, let's get started!
Why Remove Duplicates from the List?
A Python list is a built-in data structure to store a collection of items. It is written as the list of comma-separated values inside the square bracket. The most important advantage of it is that the elements inside the list are not compulsorily of the same data type. Learn more about printing lists in Python to understand the concept better.
Now there are several reasons to remove duplicates from a list. Duplicates in a list can take up unnecessary space and decrease performance. Additionally, it can lead to confusion and errors if you're using the list for certain operations. So, removing duplicates will make the data more accurate for better analysis.
For example, if you're trying to find the unique elements in a list, duplicates can give you incorrect results. In general, it's a good idea to remove duplicates from a list to make it more organized and easier to work with.
If you are still confused between an array and a list, read Array vs List in Python.
7 Ways to Remove Duplicates from a List in Python
There are many ways to remove duplicates from a list in Python. If you want to know more, get some Python homework help from our experts. Let’s check them out one by one:
1) Using set()
A set is a data structure that is very similar to lists. It is a collection of items that can be accessed using a single variable name.
The simplest way to remove duplicates from a list in Python is by converting the list into a set. It will automatically remove similar entries because set has a property that it cannot have duplicate values.
If a list is typecasted to a set, that is it is passed as an argument to the set() method, it will automatically create a set consisting of all elements in the list but it will not keep duplicate values. The resultant set can be converted back to a list using the list() method.
Example:
# removing duplicated from the list using set() # initializing list sam_list = [11, 15, 13, 16, 13, 15, 16, 11] print ("The list is: " + str(sam_list)) # to remove duplicated from list sam_list = list(set(sam_list)) # printing list after removal # ordering distorted print ("The list after removing duplicates: " + str(sam_list))
Output:
The list is: [11, 13, 15, 16, 13, 15, 16, 11] The list after removing duplicates: [11, 13, 15, 16]
This approach makes use of Python sets which are implemented as hash tables, allowing for very quick validity checks. It is quite quick, especially for larger lists, but the only drawback is that we will lose the order that exists in the original list.
2) Using a For Loop
In this method, we will iterate over the whole list using a 'for' loop. We will create a new list to keep all the unique values and use the "not in" operator in Python to find out if the current element that we are checking exists in the new list that we have created. If it does not exist, we will add it to the new list and if it does exist we will ignore it.
Example:
# removing duplicated from the list using naive methods # initializing list sam_list = [11, 13, 15, 16, 13, 15, 16, 11] print ("The list is: " + str(sam_list)) # remove duplicated from list result = [] for i in sam_list: if i not in result: result.append(i) # printing list after removal print ("The list after removing duplicates : " + str(result))
Output:
The list is: [11, 13, 15, 16, 13, 15, 16, 11] The list after removing duplicates: [11, 13, 15, 16]
3) Using collections.OrderedDict.fromkeys()
This is the fastest method to solve the problem. We will first remove the duplicates and return a dictionary that has been converted to a list. In the below code when we use the fromkeys() method it will create keys of all the elements in the list. But keys in a dictionary cannot be duplicated, therefore, the fromkeys() method will remove duplicate values on its own.
Example:
# removing duplicated from list using collections.OrderedDict.fromkeys() from collections import OrderedDict # initializing list sam_list = [11, 15, 13, 16, 13, 15, 16, 11] print ("The list is: " + str(sam_list)) # to remove duplicated from list result = list(OrderedDict.fromkeys(sam_list)) # printing list after removal print ("The list after removing duplicates: " + str(result))
Output:
The list is: [11, 13, 15, 16, 13, 15, 16, 11] The list after removing duplicates: [11, 13, 15, 16]
We used OrderedDict from the collections module to preserve the order.
4) Using a list comprehension
List comprehension refers to using a for loop to create a list and then storing it under a variable name. The method is similar to the naive approach that we have discussed above but instead of using an external for loop, it creates a for loop inside the square braces of a list. This method is called list comprehension.
We use the for loop inside the list braces and add the if condition allowing us to filter out values that are duplicates.
Example:
# removing duplicated from the list using list comprehension # initializing list sam_list = [11, 13, 15, 16, 13, 15, 16, 11] print ("The list is: " + str(sam_list)) # to remove duplicated from list result = [] [result.append(x) for x in sam_list if x not in result] # printing list after removal print ("The list after removing duplicates: " + str(result))
Output:
The list is: [11, 13, 15, 16, 13, 15, 16, 11] The list after removing duplicates: [11, 13, 15, 16]
5) Using list comprehension & enumerate()
List comprehensive when merged with enumerate function we can remove the duplicate from the python list. Basically in this method, the already occurred elements are skipped, and also the order is maintained. This is done by the enumerate function.
In the code below, the variable n keeps track of the index of the element being checked, and then it can be used to see if the element already exists in the list up to the index specified by n. If it does exist, we ignore it else we add it to a new list and this is done using list comprehensions too as we discussed above.
Example:
# removing duplicated from the list using list comprehension + enumerate() # initializing list sam_list = [11, 15, 13, 16, 13, 15, 16, 11] print ("The list is: " + str(sam_list)) # to remove duplicated from list result = [i for n, i in enumerate(sam_list) if i not in sam_list[:n]] # printing list after removal print ("The list after removing duplicates: " + str(result))
Output:
The list is: [11, 13, 15, 16, 13, 15, 16, 11] The list after removing duplicates: [11, 13, 15, 16]
6) Using the ‘pandas’ module
Using the pd.Series() method, a Pandas Series object is constructed from the orginal list. The Series object is then invoked using the drop duplicates() function to eliminate any duplicate values. Lastly, using the tolist() function, the resulting Series object is transformed back into a list.
Example:
import pandas as pd original_list = [1, 1, 2, 3, 4, 4] new_list = pd.Series(original_list).drop_duplicates().tolist() print(f"the original list is {original_list} and the new list without duplicates is {new_list}”)
Output:
the original list is [1, 1, 2, 3, 4, 4] and the new list without duplicates is [1, 2, 3, 4]
7) Using the ‘pandas’ module
Using the pd.Series() method, a Pandas Series object is constructed from the original list. The Series object is then invoked using the drop duplicates() function to eliminate any duplicate values. Lastly, using the tolist() function, the resulting Series object is transformed back into a list.
Example:
import pandas as pd original_list = [1, 1, 2, 3, 4, 4] new_list = pd.Series(original_list).drop_duplicates().tolist() print(f"the original list is {original_list} and the new list without duplicates is {new_list}”)
Output:
the original list is [1, 1, 2, 3, 4, 4] and the new list without duplicates is [1, 2, 3, 4]
How to remove duplicate words from a list?
To remove duplicate words from a list in Python, you can use the set( ) function, consider the example below :
my_list = ["mobile","laptop","earphones","mobile", 'laptop'] new_list = list(set(my_list)) print(f"the old list was {my_list}, the new list without duplicates is {new_list}”)
Output:
the old list was ['mobile', 'laptop', 'earphones', 'mobile', 'laptop'], the new list without duplicates is ['mobile', 'laptop', 'earphones']
Keep in mind, that this method does not maintain the order of the original list. If you need to keep the order, you can filter out duplicates with a loop and an empty list:
my_list =['mobile', 'laptop', 'earphones', 'mobile', 'laptop'] new_list = [] for word in my_list: if word not in new_list: new_list.append(word) print(f"the old list was {my_list}, the new list without duplicates is {new_list}”)
Output:
the old list was ['mobile', 'laptop', 'earphones', 'mobile', 'laptop'], the new list without duplicates is ['mobile', 'laptop', 'earphones']
Key Considerations
While the methods described above provide effective ways to remove duplicates from a list, there are a few additional considerations to keep in mind:
- Data Type Compatibility: Some methods may have limitations when working with specific data types. Ensure that the chosen method is compatible with the elements in your list.
- Order Preservation: If maintaining the original order of elements is crucial, consider using methods such as list comprehension with enumeration or the collections module's OrderedDict class.
- Performance: For large lists or time-sensitive operations, it's important to choose a method that offers optimal performance. Consider conducting performance tests to determine the most efficient approach.
Conclusion
Removing duplicates from a list in Python is a common task, and this article has provided you with various methods to accomplish it. We learned different methods to remove duplicate elements from the list in Python from the naive approach to utilizing set, list comprehension, and specialized modules.