As a Data Scientist, you will use sets of data in the form of dictionaries, DataFrames, or any other data type. When working with those, you might want to save them to a file, so you can use them alter on or send them to someone else. This is what Python’s pickle module is for: it serializes objects so they can be saved to a file, and loaded in a program again later on. Serialization in programming is the process of converting an object into a stream of bytes to store the object or transmit it to memory, a database, or a file. Its main purpose is to save the state of an object in order to be able to recreate it when needed.
Pickle is used for serializing and de-serializing Python object structures, which again is simply the process of converting an object in memory to a byte stream that can be stored on a disk or sent over a network. Later on, this character stream can then be retrieved and de-serialized back to a Python object. Pickling is useful for applications where you need some degree of persistency in your data. It can be also used to send data over a Transmission Control Protocol(TCP) or socket connection. Pickle is very useful for when you’re working with machine learning algorithms, where you want to save them to be able to make new predictions at a later time, without having to rewrite everything or train the model all over again.
WHEN NOT TO USE PICKLE
If you want to use data across different programming languages, pickle is not recommended. Its protocol is specific to Python, therefore cross-language compatibility is not guaranteed. The same holds for different versions of Python itself. Un-pickling a file that was already pickled in a different version of Python may not always work properly, so you have to make sure you’re using the same version & perform an update if necessary. You should also try not to un-pickle data from an untrusted source. Malicious code inside the file might be executed when un-pickling.
WHAT CAN BE PICKLED?
You can pickle objects with the following data types:
- Complex Numbers
- (Normal & Unicode) Strings
- Dictionaries that contain pickable objects
All the above can be pickled, but you can also do the same for classes and functions if they are defined at the top level of a module. Not everything can be pickled (easily), though: examples of this are generators, inner classes, lambda functions and defaultdicts. In the case of lambda functions, you need to use an additional package named
dill. With defaultdicts, you need to create them with a module level function.
To use pickle, start by importing it in Python.
For this example, we will be pickling a simple dictionary. A dictionary is a list of
key : value elements.We will save it to a file and then load it again.
To pickle this dictionary, you first need to specify the name of the file you will write it to, which is
dogs in this case. Note that the file does not have an extension. To open the file for writing, simply use the
open()function. The first argument should be the name of your file. The second argument is
w means that you'll be writing to the file, and
b refers to binary mode. This means that the data will be written in the form of byte objects. If you forget the
TypeError: must be str, not bytes will be returned. You may sometimes come across a slightly different notation;
w+b, but it provides the same functionality.
Once the file is opened for writing, you can use
pickle.dump(), which takes two arguments: the object you want to pickle and the file to which the object has to be saved. In this case, the first will be
dogs_dict, while the second will be
outfile. Don’t forget to close the file with
Now, a new file named
dogs should have appeared in the same directory as your Python script (unless you specified a file path as file name).
The process of loading a pickled file back into a Python program is similar to the one we saw previously: use the
open() function again, but this time with
'rb' as second argument (instead of
r stands for read mode and the
b stands for binary mode. We’ll be reading a binary file. Assign this to
infile. Next, use
infile as argument, and assign it to
new_dict. The contents of the file are now assigned to this new variable. Again, we'll need to close the file at the end.
To make sure that you successfully unpickled it, you can print the dictionary, compare it to the previous dictionary and check its type with
Congratulations, you now know how to successfully Pickle!