Pickling in Python means the process of serializing a Python object into a byte stream. The pickle module is responsible for the serialization and deserialization of Python objects. What does that mean? well, this is what I am going to answer in this article, so let’s get started.
First, let’s understand what serialization and deserialization mean?
Say you have a Python object (for example, a dictionary object) that looks like this:
employee = {"name": "Bob", "age": 25}
that you want to write to a file so that another Python process can read it later. How can you do that?
Well, one option is to write the dictionary as a text file and then read this text file from the other Python program.
For example, your text file can be formatted in the following manner:
name:Bob
age:25
Now, the other Python program can read this file, split each line based on the : delimiter and voila. There you go!
So what’s wrong with this approach?
I agree with you it is a working solution and it might be OK for some situations.
However, it is not ideal because of these two reasons:
- Text files take more space when they are stored on disk. This might be OK for trivial programs, but imagine if you have to send this serialized object to another machine over the network. In this case, having a small payload is crucial or else you might congest the network.
- The way you formatted your file was arbitrary. You had to somehow communicate to the other Python program what your “schema” looks like. This doesn’t scale. Ideally, what we need is a well-defined standardized protocol so that any other program can easily and deterministically read your serialized data.
Another popular standard for serializing data is JSON. You probably might have heard of it.
JSON is another textual protocol that is widely used, standardized, but doesn’t really solve the issue of being a textual representation, which means it is going to be large in size.
This is exactly the problem the pickle solves.
So what is the use of pickle in Python?
If you want to serialize a Python object, whether to store it on disk or to transfer it over the network, pickle is a Python module that helps you serialize and deserialize Python objects in a binary format (not textual format). This means that the size of your serialized objects will be much more compact than their textual counterparts.
How to pickle a Python object?
Here is an example of how to pickle a python dictionary and write it to a file:
import pickle
e = {"name": "Bob", "age": 25}
with open('employee.pickle', 'wb') as f:
pickle.dump(e, f)
Note the following:
- you need to import the pickle module
- the file object is need to be opened in ‘wb’ (binary write) mode
- it is recommended that pickle files have a .pickle extension in Python 3, but this is not mandatory
- dump() writes the serialized bytes of the dictionary e in a file
If you try to read the contents of the pickle file, you will get this binary stream of data that will pretty much look like gibberish to you. But trust me, it is not 🙂
$ cat employee.pickle
��}�(�name��Bob��age�Ku.%
How to unpickle a Python file?
Now let’s see how we can read the serialized pickled file from another Python program.
import pickle
with open('employee.pickle', 'rb') as f:
e = pickle.load(f)
print(type(e))
print(e)
Now if you run this program, this is what you will get:
$ python3 unpickle-example.py
<class 'dict'>
{'name': 'Bob', 'age': 25}
Magic, huh? 🙂
I want you to notice the following:
- e is a dictionary, exactly the same type that was serialized in the pickling program
- e has exactly the same value that was serialized in the pickling program
So there you have it. You were able to, essentially migrate a dictionary from one Python program to another. I don’t know about you but I think this is pretty cool.
Is Python Pickle Fast?
This is a common question.
It depends on what you compare it to. pickle is not the only serialization protocol out there, there are many.
In the following section, I will compare pickle to two other very popular serialization protocols: json and protocol buffers (protobufs).
I won’t go into details of how you can use json and protobufs to serialize and deserialize objects in Python. If you are interested, you can check this article for json, and this one for protobufs.
Comparison between Pickle, JSON, and Protocol Buffers
In the following experiment, I will be comparing the three protocols based on the speed of serialization and deserialization, in addition to the size of the serialized object.
The Python object that I will be serializing is a Python dictionary of 100000000 entries where each entry is composed of an integer key and an integer value.
The following table shows the results of this experiment:
criteria | pickle | json | protocol buffers |
---|---|---|---|
serialization speed (seconds) | 7.05 | 162 | 1180 |
deserialization speed (seconds) | 18 | 220 | 1210 |
size of the serialized object | 954MB | 2GB | 1.1GB |
As you can see, pickle is faster and much more compact than json.
Protobufs are as compact as pickle (expected), but they are much slower (I was using the pure Python protobuf implementation, the python-wrapped C++ implementation is much faster).
So which protocol should you use?
This really depends on your needs.
Here is a table that shows the pros and cons of each of the protocols discussed above.
pickle | json | protocol buffers | |
---|---|---|---|
Pros | – relatively faster – suitable for machine readers – compact | – multi-language support – suitable for human readers | – multi-language support – suitable for machine readers – compact |
Cons | – no multi-language support – not suitable for human readers – only suitable inside the python ecosystem. | – relatively larger in size | – not suitable for human readers |
What Can and Can’t be Pickled?
In all the examples above, I pickled and unpickled a Python dictionary that contains string keys and string/integer values.
Not everything can be pickled though.
There are some limitations that you I want you to be aware of. Here is a list of what can be pickled:
- None, True, and False
- integers, floating-point numbers, and complex numbers
- strings, bytes, and byte arrays
- tuples, lists, sets, and dictionaries containing only items that can be pickled
- functions and classes defined at the top level of a module
Conclusion
pickle is a Python module that is used to serialize and deserialize Python objects into a binary format so you can store them on disk or send them over the network in an efficient and compact manner. Unlike other protocols ( JSON, XML, protocol buffers, …), pickle is a Python-specific protocol.
Learning Python?
Check out the Courses section!
Featured Posts
- The Python Learning Path (From Beginner to Mastery)
- Learn Computer Science (From Zero to Hero)
- Coding Interview Preparation Guide
- The Programmer’s Guide to Stock Market Investing
- How to Start Your Programming Blog?
Are you Beginning your Programming Career?
I provide my best content for beginners in the newsletter.
- Python tips for beginners, intermediate, and advanced levels.
- CS Career tips and advice.
- Special discounts on my premium courses when they launch.
And so much more…