{"id":2744,"date":"2021-02-14T06:54:11","date_gmt":"2021-02-14T06:54:11","guid":{"rendered":"https:\/\/www.afternerd.com\/blog\/?p=2744"},"modified":"2021-02-14T06:54:12","modified_gmt":"2021-02-14T06:54:12","slug":"python-pickle","status":"publish","type":"post","link":"https:\/\/www.afternerd.com\/blog\/python-pickle\/","title":{"rendered":"What is Pickling in Python? (In-depth Guide)"},"content":{"rendered":"\n<p><em>Pickling<\/em> in Python means the process of serializing a Python object into a byte stream. The pickle module is responsible for the serialization and deserialization of Python objects. What does that mean? well, this is what I am going to answer in this article, so let&#8217;s get started.<\/p>\n\n\n\n<p>First, let&#8217;s understand what serialization and deserialization mean?<\/p>\n\n\n\n<p class=\"prettyprint\">Say you have a Python object (for example, a dictionary object) that looks like this:<\/p>\n\n\n\n<pre class=\"wp-block-code prettyprint\"><code>employee = {\"name\": \"Bob\", \"age\": 25}<\/code><\/pre>\n\n\n\n<p>that you want to write to a file so that another Python process can read it later. How can you do that?<\/p>\n\n\n\n<p> Well, one option is to write the dictionary as a text file and then read this text file from the other Python program.<\/p>\n\n\n\n<p>For example, your text file can be formatted in the following manner:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>name:Bob\nage:25<\/code><\/pre>\n\n\n\n<p>Now, the other Python program can read this file, split each line based on the <span class=\"symbol\">:<\/span> delimiter and voila. There you go!<\/p>\n\n\n\n<p>So what&#8217;s wrong with this approach?<\/p>\n\n\n\n<p>I agree with you it is a working solution and it might be OK for some situations.<\/p>\n\n\n\n<p>However, it is not ideal because of these two reasons:<\/p>\n\n\n\n<ol><li>Text files <strong><em>take more space<\/em><\/strong> when they are stored on disk. This might be OK for trivial programs, but imagine if you have to send this serialized object to another machine over the network. In this case, having a small payload is crucial or else you might congest the network.<\/li><li>The way you formatted your file was <strong><em>arbitrary<\/em><\/strong>. You had to somehow communicate to the other Python program what your &#8220;schema&#8221; looks like. This doesn&#8217;t scale. Ideally, what we need is a well-defined standardized protocol so that any other program can easily and deterministically read your serialized data.<\/li><\/ol>\n\n\n\n<p>Another popular standard for serializing data is <a rel=\"noreferrer noopener\" href=\"https:\/\/www.w3schools.com\/js\/js_json_intro.asp\" target=\"_blank\">JSON<\/a>. You probably might have heard of it.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-medium is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.afternerd.com\/blog\/wp-content\/uploads\/2021\/02\/Screen-Shot-2021-02-13-at-10.19.14-PM-300x280.png\" alt=\"\" class=\"wp-image-2780\" width=\"410\" height=\"383\" srcset=\"https:\/\/www.afternerd.com\/blog\/wp-content\/uploads\/2021\/02\/Screen-Shot-2021-02-13-at-10.19.14-PM-300x280.png 300w, https:\/\/www.afternerd.com\/blog\/wp-content\/uploads\/2021\/02\/Screen-Shot-2021-02-13-at-10.19.14-PM-1024x956.png 1024w, https:\/\/www.afternerd.com\/blog\/wp-content\/uploads\/2021\/02\/Screen-Shot-2021-02-13-at-10.19.14-PM-768x717.png 768w, https:\/\/www.afternerd.com\/blog\/wp-content\/uploads\/2021\/02\/Screen-Shot-2021-02-13-at-10.19.14-PM.png 1052w\" sizes=\"(max-width: 410px) 100vw, 410px\" \/><\/figure><\/div>\n\n\n\n<p>JSON is another textual protocol that is widely used, standardized, but doesn&#8217;t really solve the issue of being a textual representation, which means it is going to be large in size.<\/p>\n\n\n\n<p>This is exactly the problem the <em>pickle<\/em> solves.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">So what is the use of pickle in Python?<\/h2>\n\n\n\n<p>If you want to serialize a Python object, whether to store it on disk or to transfer it over the network, <em>pickle<\/em> is a Python module that helps you <em>serialize<\/em> and <em>deserialize<\/em> Python objects in a binary format (not textual format). This means that the size of your serialized objects will be much more compact than their textual counterparts.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">How to pickle a Python object?<\/h2>\n\n\n\n<p class=\"prettyprint\">Here is an example of how to pickle a python dictionary and write it to a file:<\/p>\n\n\n\n<pre class=\"wp-block-code prettyprint\"><code>import pickle\n\ne = {\"name\": \"Bob\", \"age\": 25}\nwith open('employee.pickle', 'wb') as f:\n    pickle.dump(e, f)<\/code><\/pre>\n\n\n\n<p>Note the following:<\/p>\n\n\n\n<ul><li>you need to import the pickle module<\/li><li>the file object is need to be opened in <strong>&#8216;wb&#8217;<\/strong> (binary write) mode<\/li><li>it is recommended that pickle files have a <span class=\"symbol\">.pickle<\/span> extension in Python 3, but this is not mandatory<\/li><li>dump() writes the serialized bytes of the dictionary <span class=\"symbol\">e<\/span> in a file<\/li><\/ul>\n\n\n\n<p class=\"prettyprint\">If you try to read the contents of the pickle file, you will get this binary stream of data that will pretty much look like gibberish to you. But trust me, it is not \ud83d\ude42<\/p>\n\n\n\n<pre class=\"wp-block-code prettyprint\"><code>$ cat employee.pickle\n\ufffd\ufffd}\ufffd(\ufffdname\ufffd\ufffdBob\ufffd\ufffdage\ufffdKu.%<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">How to unpickle a Python file?<\/h2>\n\n\n\n<p>Now let&#8217;s see how we can read the serialized pickled file from another Python program.<\/p>\n\n\n\n<pre class=\"wp-block-code prettyprint\"><code>import pickle\n\nwith open('employee.pickle', 'rb') as f:\n    e = pickle.load(f)\n\nprint(type(e))\nprint(e)<\/code><\/pre>\n\n\n\n<p>Now if you run this program, this is what you will get:<\/p>\n\n\n\n<pre class=\"wp-block-code prettyprint\"><code>$ python3 unpickle-example.py\n&lt;class 'dict'&gt;\n{'name': 'Bob', 'age': 25}<\/code><\/pre>\n\n\n\n<p>Magic, huh? \ud83d\ude42<\/p>\n\n\n\n<p>I want you to notice the following:<\/p>\n\n\n\n<ul><li><span class=\"symbol\">e<\/span> is a dictionary, exactly the same <strong>type<\/strong> that was serialized in the pickling program<\/li><li><span class=\"symbol\">e<\/span> has exactly the same value that was serialized in the pickling program<\/li><\/ul>\n\n\n\n<p>So there you have it. You were able to, essentially migrate a dictionary from one Python program to another. I don&#8217;t know about you but I think this is pretty cool.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Is Python Pickle Fast?<\/h2>\n\n\n\n<p>This is a common question. <\/p>\n\n\n\n<p>It depends on what you compare it to. pickle is not the only serialization protocol out there, there are many. <\/p>\n\n\n\n<p>In the following section, I will compare pickle to two other very popular serialization protocols: <em>json<\/em> and <em><a rel=\"noreferrer noopener\" href=\"https:\/\/developers.google.com\/protocol-buffers\" target=\"_blank\">protocol buffers<\/a><\/em> (protobufs).<\/p>\n\n\n\n<p>I won&#8217;t go into details of how you can use json and protobufs to serialize and deserialize objects in Python. If you are interested, you can check <a href=\"https:\/\/realpython.com\/python-json\/\" target=\"_blank\" rel=\"noreferrer noopener\">this article for json<\/a>, and <a href=\"https:\/\/www.freecodecamp.org\/news\/googles-protocol-buffers-in-python\/\" target=\"_blank\" rel=\"noreferrer noopener\">this one for protobufs<\/a>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Comparison between Pickle, JSON, and Protocol Buffers<\/h2>\n\n\n\n<p>In the following experiment, I will be comparing the three protocols based on the speed of serialization and deserialization, in addition to the size of the serialized object.<\/p>\n\n\n\n<p>The Python object that I will be serializing is a Python dictionary of 100000000 entries where each entry is composed of an integer key and an integer value.<\/p>\n\n\n\n<p>The following table shows the results of this experiment:<\/p>\n\n\n\n<figure class=\"wp-block-table is-style-stripes\"><table><thead><tr><th>criteria<\/th><th>pickle<\/th><th>json<\/th><th>protocol buffers<\/th><\/tr><\/thead><tbody><tr><td>serialization speed (seconds)<\/td><td>7.05<\/td><td>162<\/td><td>1180<\/td><\/tr><tr><td>deserialization speed (seconds)<\/td><td>18<\/td><td>220<\/td><td>1210<\/td><\/tr><tr><td>size of the serialized object<\/td><td>954MB<\/td><td>2GB<\/td><td>1.1GB<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>As you can see, pickle is faster and much more compact than <em>json<\/em>.<\/p>\n\n\n\n<p>Protobufs are as compact as <em>pickle<\/em> (expected), but they are much slower (I was using the pure Python protobuf implementation, the <a rel=\"noreferrer noopener\" href=\"https:\/\/github.com\/Cue\/fast-python-pb#fast-python-pb-fast-python-protocol-buffers\" target=\"_blank\">python-wrapped C++ implementation<\/a> is much faster).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">So which protocol should you use?<\/h3>\n\n\n\n<p>This really depends on your needs.<\/p>\n\n\n\n<p>Here is a table that shows the pros and cons of each of the protocols discussed above.<\/p>\n\n\n\n<figure class=\"wp-block-table is-style-stripes\"><table><thead><tr><th><\/th><th>pickle<\/th><th>json<\/th><th>protocol buffers<\/th><\/tr><\/thead><tbody><tr><td>Pros<\/td><td>&#8211; relatively faster<br>&#8211; suitable for machine readers<br>&#8211; compact<\/td><td>&#8211; multi-language support<br>&#8211; suitable for human readers<\/td><td>&#8211; multi-language support<br>&#8211; suitable for machine readers<br>&#8211; compact<\/td><\/tr><tr><td>Cons<\/td><td>&#8211; no multi-language support<br>&#8211; not suitable for human readers<br>&#8211; only suitable inside the python ecosystem.<\/td><td>&#8211; relatively larger in size<\/td><td>&#8211; not suitable for human readers<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">What Can and Can&#8217;t be Pickled?<\/h2>\n\n\n\n<p>In all the examples above, I pickled and unpickled a Python dictionary that contains string keys and string\/integer values. <\/p>\n\n\n\n<p>Not everything can be pickled though.<\/p>\n\n\n\n<p>There are some limitations that you I want you to be aware of. Here is a list of what can be pickled:<\/p>\n\n\n\n<ul><li>None, True, and False<\/li><li>integers, floating-point numbers, and complex numbers<\/li><li>strings, bytes, and byte arrays<\/li><li>tuples, lists, sets, and dictionaries containing only items that can be pickled<\/li><li>functions and classes defined at the top level of a module<\/li><\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p><em>pickle<\/em> is a Python module that is used to serialize and deserialize Python objects into a binary format so you can store them on disk or send them over the network in an efficient and compact manner. Unlike other protocols ( JSON, XML, protocol buffers, &#8230;), pickle is a Python-specific protocol.<\/p>\n\n\n<h3>Learning Python?<\/h3>\n<p>Check out <a href=\"https:\/\/courses.afternerd.com\/\">the Courses section!<\/a><\/p>\n\n\n\n<h2>Featured Posts<\/h2>\n<ul>\n<li><a href=\"https:\/\/www.afternerd.com\/blog\/learn-python\/\">The Python Learning Path (From Beginner to Mastery)<\/a><\/li>\n<li><a href=\"https:\/\/www.afternerd.com\/blog\/learn-computer-science\/\">Learn Computer Science (From Zero to Hero)<\/a><\/li>\n<li><a href=\"https:\/\/www.afternerd.com\/blog\/coding-interview\/\">Coding Interview Preparation Guide<\/a><\/li>\n<li><a href=\"https:\/\/www.afternerd.com\/blog\/stock-investing-for-beginners\/\">The Programmer&#8217;s Guide to Stock Market Investing<\/a><\/li>\n<li><a href=\"https:\/\/www.afternerd.com\/blog\/start-programming-blog\/\">How to Start Your Programming Blog?<\/a><\/li>\n<\/ul>\n<div class=\"after-post-box\">\n<h2>Are you Beginning your Programming Career?<\/h2>\n<h3>I provide my best content for beginners in the newsletter.<\/h3>\n<ul>\n<li>Python tips for beginners, intermediate, and advanced levels.<\/li>\n<li>CS Career tips and advice.<\/li>\n<li>Special discounts on my premium courses when they launch.<\/li>\n<\/ul>\n<p>And so much more&#8230;<\/p>\n<h3>Subscribe now. It&#8217;s Free.<\/h3>\n<p><script type=\"text\/javascript\" src=\"\/\/mautic.afternerd.com\/form\/generate.js?id=2\"><\/script><\/p>\n<\/div>\n\n","protected":false},"excerpt":{"rendered":"","protected":false},"author":1,"featured_media":2756,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[12],"tags":[13],"yst_prominent_words":[158,253],"_links":{"self":[{"href":"https:\/\/www.afternerd.com\/blog\/wp-json\/wp\/v2\/posts\/2744"}],"collection":[{"href":"https:\/\/www.afternerd.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.afternerd.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.afternerd.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.afternerd.com\/blog\/wp-json\/wp\/v2\/comments?post=2744"}],"version-history":[{"count":41,"href":"https:\/\/www.afternerd.com\/blog\/wp-json\/wp\/v2\/posts\/2744\/revisions"}],"predecessor-version":[{"id":2791,"href":"https:\/\/www.afternerd.com\/blog\/wp-json\/wp\/v2\/posts\/2744\/revisions\/2791"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.afternerd.com\/blog\/wp-json\/wp\/v2\/media\/2756"}],"wp:attachment":[{"href":"https:\/\/www.afternerd.com\/blog\/wp-json\/wp\/v2\/media?parent=2744"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.afternerd.com\/blog\/wp-json\/wp\/v2\/categories?post=2744"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.afternerd.com\/blog\/wp-json\/wp\/v2\/tags?post=2744"},{"taxonomy":"yst_prominent_words","embeddable":true,"href":"https:\/\/www.afternerd.com\/blog\/wp-json\/wp\/v2\/yst_prominent_words?post=2744"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}