Working with large amounts of data can come with some inherent issues. One of the most common problems stems from duplicate values. We often need to take special note of how often an item comes up within our records. Python makes it fairly easy to do so within the context of the list data type. We can begin with a simple example to demonstrate the basics of Python count unique values in list.
finduniqueList = [5, 5, 4, 3, 2, 2, 8]
newList = []
count = 0
for point in finduniqueList:
if point not in newList:
count += 1
newList.append(point)
print("Unique items = ", count)
This method is by no means the most efficient way to find unique elements in a Python list. However, it does serve as a solid example of how we could go about the procedure without using additional functions. We begin by creating a list of numbers in finduniqueList. We then create an empty new list as the newList variable. The count variable will be used to measure unique items we iterate over the finduniqueList. Each loop will test a newList item against the finduniuqeList values to determine which are a unique number list element.
If the assigned point variable from the loop is found in newList then it won’t be added. At the loop’s end our newList will consist of unique items from finduniqueList. We will also have the number of unique elements within the unique count variable. The final line of code prints the value found within distinct count.
The previous example works, but it’s a little convoluted. We can simplify this process by using functions from the standard Python library. In particular, the unique count function is often quite useful when working with a string to find a distinct value among many duplicate numbers. Consider the following Python program.
from collections import Counter
finduniqueList = [5, 5, 4, 3, 2, 2, 8]
items = Counter(finduniqueList).keys()
print("No of unique items in the list are:", len(items))
We begin by importing counter from collections. Collections is a module which provides additional datatypes for python. In this case we’re making use of counter. Counter can be seen as a container of sorts. As the name suggests, count stores the counted distinct element value of iterable python objects. We then proceed to create the list, finduniqueList, that we’ll be using with counter.
We call counter and assign the returned unique item value to a new variable, items. Counter essentially creates a Python dictionary using unique items within the finduniqueList variable. Counter will create keys based on those distinct element items. This is what we pass to the newly assigned items variable. Finally, we print out the length of items using the len function. This provides the number of unique item values in finduniqueList.
There’s also a wealth of Python libraries beyond what we find in the standard distribution. NumPy, a collection of advanced mathematical functions, is one of the most important. We can use NumPy to perform a unique value count with more complex data types to eliminate duplicate elements from our text values.
import numpy
numpyList = numpy.array([5, 5, 4, 3, 2, 2, 8]) (
uniqueValues, valueCount) = numpy.unique(numpyList, return_counts=True)
density = numpy.asarray((uniqueValues, valueCount)).T
print(len(uniqueValues))
We begin by importing NumPy. Next, we create a NumPy array which will function fairly similarly to a standard Python list and help us get a distinct count. We can then use NumPy’s unique function with our numpyList variable. Note that we pass return_counts as true when calling numpy.unique. This tells the function to return a tuple containing a list of unique values along with a list of frequencies which go along with it, both unique records and duplicates from the text values. We store this in uniqueValues and valueCount. Next, we create a NumPy ndarray from uniqueValues and valueCount.
Note the ndarray means we can work with multiple columns. So, for example, we could use a dataframe column with an input list from a csv file. Though we’d most commonly use a data frame when working with the Python data analysis library. This is one of the main reasons for assigning the density variable. It provides us with a more under the hood view of what’s actually going on when we’re getting the uniqueValues variable. Finally, we can simply use the len function to see the length of the uniqueValues variable. This is the number of unique records or unique words in the set or table we created as numpyList.