Bytes-like objects in Python are an important, but not always easy to understand, part of the language. Bytes-like objects are essentially just collections of bytes, like any other file on a digital system, stored as a Python variable. Their main distinction in Python is that a bytes-like object won’t be entirely human readable. Even a bytes-like object that consists of plain text will have some garbled characters and notation within it.
What Are Byte-Like Objects Used For?
Part of what makes a byte-like object difficult to understand is that everything on a drive is technically just made up of bytes. For example, consider a text file on your computer. If you opened it up in a hex editor you’d see something very similar to a standard string of text in Python. But you’d find some extra characters that signal formatting options, end-of-line signals, and similar notations. This is similar in some respects to the formatting tags in HTML. While the actual text written into the text file is analogous to a string in Python. But the totality of the file, what you’d see in a hex editor, is the bytes-like object.
Text, Bytes, and Beyond
The difference between a string of characters and bytes-like objects containing that string can be more easily understood by looking at actual Python code. Consider the following example.
pythonString = ‘We are starting ön a string’
string2utf8 = pythonString.encode()
print(‘Original=’, pythonString)
print(‘UTF8=’, string2utf8)
The output reads as “b’We are starting \xc3\xb6n a string'”. This is a bytes-like object. We initially start out with a Python string. And in the next line we use encode to convert the string to bytes. By default, Python encodes as an 8-bit value within the Unicode transformation format (UTF-8). But consider what would happen if we appended the following code.
string2utf16 = pythonString.encode(‘UTF-16’)
print (‘UTF16=’,string2utf16)
print(‘UTF8 decoded = ‘,string2utf8.decode(‘UTF-8’))
print(‘UTF16 decoded = ‘,string2utf16.decode(‘UTF-16’))
print(‘UTF16 decoded as UTF8 = ‘,string2utf8.decode(‘UTF-16’))
print(‘UTF16 initial =’, type(string2utf16))
print(‘UTF16 after decoding =’, type(string2utf16.decode(‘UTF-16’)))
This illustrates just how much difference encoding can make. We encode the same string as UTF16 and then print out what amounts to something very different than seen with the UTF8 encoding. Following that, we decode the respective UTF8 and UTF16 data and print it to screen. This code demonstrates the fact that we can convert bytes to other formats fairly easily in most functions related to data management.
But take special note of the 5th line within the appended code. Here we see what happens when data is decoded into an incorrect format. The UTF8 information is decoded as UTF16. The end result pushes past the English character set into ideograms.
Finally, the last two lines look at the specific types used within this process. We begin by using the type function on the string2utf16 variable. Type, as the name suggests, shows which variable type we’re using. In this case we see that string2utf16 is bytes. But we then do the same with the returned value resulting from using decode on string2utf16. This shows a successful conversion of byte to string.
The fact that the same data can be presented so differently highlights why a bytes-like object and strings are so different from each other. In the most simple example, strings and bytes-like objects might only differ by a single character. But in real-world situations, the two will produce radically different results. This can be seen even more clearly when we look at file access.
<h2>Bytes and Binaries</h2>
If you’ve ever seen a Python tutorial that addresses file access then you’ve probably noticed that files can be opened in different ways. For example, you might open files with the following code.
openedFile = open(‘C:\ourtext.txt’, ‘rt’)
The rt passes some information to the open function. It states that the item was opened as read-only and that the data within is formatted as text rather than a bytes like object. However, changing the t to a b switches things around. The item would then be opened in binary mode.
When Things Go Wrong
It’s also important to keep in mind that different Python functions expect information to be formatted in different ways. This is why it’s generally a good idea to always format data into the most accessible format. In the earlier example, the UTF8 information still had most of the text intact. It might be tempting to simply grab information from it without any reformatting. But doing so tends to lead to a Python typeerror further down the line. The easiest way to solve typeerror problems is to simply prevent them from happening by being proactive with all data used within your code.
The Importance of Data Types
Python is generally quite generous when it comes to variables. But with that freedom comes responsibility. We can’t always count on the Python interpreter to ensure variables are kept in the format we’re expecting. So in cases like this, it’s important to essentially clean up after ourselves within our code. That means converting variables, such as binary information, into a format that’s compatible with the rest of our code.