Image for post
Image for post

As a Data Scientist, you will use sets of data in the form of dictionaries, DataFrames, or any other data type. When working with those, you might want to save them to a file, so you can use them alter on or send them to someone else. This is what Python’s pickle module is for: it serializes objects so they can be saved to a file, and loaded in a program again later on. Serialization in programming is the process of converting an object into a stream of bytes to store the object or transmit it to memory, a database, or a file. Its main purpose is to save the state of an object in order to be able to recreate it when needed.

Image for post
Image for post

Pickle is used for serializing and de-serializing Python object structures, which again is simply the process of converting an object in memory to a byte stream that can be stored on a disk or sent over a network. Later on, this character stream can then be retrieved and de-serialized back to a Python object. Pickling is useful for applications where you need some degree of persistency in your data. It can be also used to send data over a Transmission Control Protocol(TCP) or socket connection. Pickle is very useful for when you’re working with machine learning algorithms, where you want to save them to be able to make new predictions at a later time, without having to rewrite everything or train the model all over again.

WHEN NOT TO USE PICKLE

If you want to use data across different programming languages, pickle is not recommended. Its protocol is specific to Python, therefore cross-language compatibility is not guaranteed. The same holds for different versions of Python itself. Un-pickling a file that was already pickled in a different version of Python may not always work properly, so you have to make sure you’re using the same version & perform an update if necessary. You should also try not to un-pickle data from an untrusted source. Malicious code inside the file might be executed when un-pickling.

WHAT CAN BE PICKLED?

You can pickle objects with the following data types:

All the above can be pickled, but you can also do the same for classes and functions if they are defined at the top level of a module. Not everything can be pickled (easily), though: examples of this are generators, inner classes, lambda functions and defaultdicts. In the case of lambda functions, you need to use an additional package named dill. With defaultdicts, you need to create them with a module level function.

PICKLING FILES

To use pickle, start by importing it in Python.

Image for post
Image for post

For this example, we will be pickling a simple dictionary. A dictionary is a list of key : value elements.We will save it to a file and then load it again.

Image for post
Image for post

To pickle this dictionary, you first need to specify the name of the file you will write it to, which is dogs in this case. Note that the file does not have an extension. To open the file for writing, simply use the open()function. The first argument should be the name of your file. The second argument is 'wb'. The w means that you'll be writing to the file, and b refers to binary mode. This means that the data will be written in the form of byte objects. If you forget the b, a TypeError: must be str, not bytes will be returned. You may sometimes come across a slightly different notation; w+b, but it provides the same functionality.

Image for post
Image for post

Once the file is opened for writing, you can use pickle.dump(), which takes two arguments: the object you want to pickle and the file to which the object has to be saved. In this case, the first will be dogs_dict, while the second will be outfile. Don’t forget to close the file with close()!

Image for post
Image for post

Now, a new file named dogs should have appeared in the same directory as your Python script (unless you specified a file path as file name).

UN-PICKLING FILES

The process of loading a pickled file back into a Python program is similar to the one we saw previously: use the open() function again, but this time with 'rb' as second argument (instead of wb). The r stands for read mode and the b stands for binary mode. We’ll be reading a binary file. Assign this to infile. Next, use pickle.load(), with infile as argument, and assign it to new_dict. The contents of the file are now assigned to this new variable. Again, we'll need to close the file at the end.

Image for post
Image for post

To make sure that you successfully unpickled it, you can print the dictionary, compare it to the previous dictionary and check its type with type().

Image for post
Image for post
Image for post
Image for post

Congratulations, you now know how to successfully Pickle!

RELATED LINKS

Written by

Data Scientist & Machine Learning Engineer

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store