Python is well known for its elegant simplicity. It’s a language designed around the idea of people creating additional libraries to add any functionality they need. This has resulted in a wide range of 3rd party Python libraries related to data science, math, and analysis. The language’s flexibility is impressive. But where Python’s system really shines is in its ability to bring ease of use to these advanced functions.
Powerful data libraries like NumPy offer similar ease of use to Python’s default options. The new functionality and data types seldom feel like they’re dissimilar piece of code melded onto Python’s interpreter. Instead, the new systems follow the same general methodology used by standard data types. For example, Python’s system provides a wide range of functions to work with containers like lists. And the same general functionality is found for containers in the NumPy library. You can even work with a multi dimensional array in NumPy and then flatten it. And as is often the case with Python’s data types, this can even be accomplished in different ways which can fit different coding styles.
Understanding the Basics of a NumPy Array
We need to take a closer look at what a NumPy array is before moving on to learning how to flatten them. At first glance, NumPy arrays look remarkably similar to a standard Python list. But these arrays use a lot of clever coding methods to vastly improve the speed and efficiency you see when using them in data-heavy code. For example, NumPy arrays can be manipulated and changed in a number of different ways. And yet the NumPy arrays are still technically immutable. When you manipulate, rather than read, a NumPy array it’s essentially being recreated from scratch with the requested changes. So adding or removing an element from a 2d array isn’t technically changing the array to one or more dimensions. NumPy instead creates a new array with the relevant data from the initial 2D structure.
This lets NumPy derive the computational benefits of a read-only type while still being mutable from the perspective of the programmer using it. These types of under-the-hood tweaks are continually integrated throughout the larger NumPy codebase. As another example, a NumPy array will sometimes automatically shift operations to pre-compiled C code created during the initial installation rather than the standard Python interpreter.
All of this highlights the fact that there are very important reasons to use a NumPy array for computationally intensive tasks. The NumPy library is extremely good at executing something like a flatten operation on multi-dimensional arrays without incurring the processing load you might expect. And, as you’ll soon see, this also helps to explain why we use specific NumPy functionality to accomplish these tasks. Using NumPy functionality to work with NumPy data types helps us to take advantage of the clever tricks integrated into the library’s design. NumPy’s built-in functionality will almost always be far faster than if we were trying to do something like flatten an array with standard Python tools.
Preliminary Steps To Flatten the Array
With all that in mind, how can we actually go about taking a multidimensional Python NumPy array and flattening it into a 1d array? Take a look at the following code.
import numpy as np
ourArray = np.array([[1,2,3],[4,5,6],[7,8,9]])
print(ourArray)
print(type(ourArray))
We start by importing NumPy as np before moving on to create an array with it on line 3. We then populate this new array, called ourArray, with three sets of numbers that range from 1 to 9. Next, we print out both the contents of ourArray and its data type. This is to show that the data is correctly formatted, usable, and a proper ndarray. Now that we have a usable array we can go on to flatten it.
Different Ways To Flatten an Array
We can begin by looking at the following code.
import numpy as np
ourArray = np.array([[1,2,3],[4,5,6],[7,8,9]])
ourList = [[1,2,3],[4,5,6],[7,8,9]]
x = ourArray.flatten()
y = np.ravel(ourArray)
z = np.ravel(ourList)
print(ourArray)
print(type(ourArray))
print(x)
print(type(x))
print(y)
print(type(y))
print(ourList)
print(type(ourList))
print(z)
print(type(z))
This code is similar in many ways to our original array declaration. The first difference comes in when we declare a Python list as ourList. Next, we assign the x, y, and z variables. We’ll return to these declarations later. But first, let’s look at the print statements which follow the new variables.
You’ll note that we print out the contents of ourArray, ourList, x, y and z. We also print out the type of each variable. This process results in verification that x, y, and z are a NumPy ndarray. The original array is multi-dimensional, as we’d expect. But x, y, and z all contain identical single-dimensional data laid out in the order presented in our original array. So how does this code actually achieve these transformations?
The first thing to keep in mind is that the NumPy module provides us with specific Python NumPy functionality. In this case, we begin by using its flatten method to assign data to x. Flatten, as the name suggests, is an easy way to flatten a multi-dimensional NumPy array. We don’t pass any arguments in this example since the default behavior is the best fit for our needs. But passing different values to NumPy flatten will change how it sorts every array element. For example, you could change line 6 to the following.
x = ourArray.flatten(order=’F’)
F stands for Fortran-style. In this case it means our data is flattened in column-major order. Other order parameter options include C for row-major, A for column-major, and K for memory-sorted.
NumPy also provides a similar method to flatten called ravel. On a technical level, there are some distinctions between the two. But this is largely academic. The most important point is that ravel is a little faster than flatten. However, there are circumstances where ravel can produce unpredictable results if you modify the data it returns. But for general purposes it’s generally fine to use whichever of the two options, ravel or the flatten function, you like best. You can see that the y variable used with ravel flattens identically to the x assigned from flatten.
One interesting point can be seen in the z variable. The z variable contains the output of ravel running on a Python list rather than NumPy array. You’ll note that when we print the type assigned to z it comes out as a NumPy ndarray. This is a quirk of NumPy that you should keep in mind. When you work with NumPy-specific functionality it tends to return NumPy dependent data. Even if you hadn’t started out with data in that format.