保存对象(数据持久性) [英] Saving an Object (Data persistence)

查看:80
本文介绍了保存对象(数据持久性)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经创建了一个像这样的对象:

I've created an object like this:

company1.name = 'banana' 
company1.value = 40

我想保存该对象.我该怎么办?

I would like to save this object. How can I do that?

推荐答案

您可以使用标准库中的pickle模块. 这是它在您的示例中的基本应用:

You could use the pickle module in the standard library. Here's an elementary application of it to your example:

import pickle

class Company(object):
    def __init__(self, name, value):
        self.name = name
        self.value = value

with open('company_data.pkl', 'wb') as output:
    company1 = Company('banana', 40)
    pickle.dump(company1, output, pickle.HIGHEST_PROTOCOL)

    company2 = Company('spam', 42)
    pickle.dump(company2, output, pickle.HIGHEST_PROTOCOL)

del company1
del company2

with open('company_data.pkl', 'rb') as input:
    company1 = pickle.load(input)
    print(company1.name)  # -> banana
    print(company1.value)  # -> 40

    company2 = pickle.load(input)
    print(company2.name) # -> spam
    print(company2.value)  # -> 42

您还可以定义自己的简单实用程序,如下所示,该实用程序打开文件并向其中写入单个对象:

You could also define your own simple utility like the following which opens a file and writes a single object to it:

def save_object(obj, filename):
    with open(filename, 'wb') as output:  # Overwrites any existing file.
        pickle.dump(obj, output, pickle.HIGHEST_PROTOCOL)

# sample usage
save_object(company1, 'company1.pkl')

更新

由于这是一个很受欢迎的答案,因此,我想谈谈一些稍微高级的用法主题.

Update

Since this is such a popular answer, I'd like touch on a few slightly advanced usage topics.

实际上总是最好使用 cPickle 模块而不是pickle,因为前者是用C编写的,并且速度更快.它们之间有一些细微的差别,但是在大多数情况下它们是等效的,并且C版本将提供非常优越的性能.切换到它并不容易,只需将import语句更改为此:

It's almost always preferable to actually use the cPickle module rather than pickle because the former is written in C and is much faster. There are some subtle differences between them, but in most situations they're equivalent and the C version will provide greatly superior performance. Switching to it couldn't be easier, just change the import statement to this:

import cPickle as pickle

在Python 3中,cPickle重命名为_pickle,但是不再需要执行此操作,因为pickle模块现在可以自动执行它了-请参阅

In Python 3, cPickle was renamed _pickle, but doing this is no longer necessary since the pickle module now does it automatically—see What difference between pickle and _pickle in python 3?.

缺点是,您可以使用以下类似内容来确保您的代码在Python 2和3中都可用时始终使用C版本.

The rundown is you could use something like the following to ensure that your code will always use the C version when it's available in both Python 2 and 3:

try:
    import cPickle as pickle
except ModuleNotFoundError:
    import pickle

数据流格式(协议)

pickle可以以几种不同的特定于Python的格式读写文件,称为 protocols (如 0的版本是二进制文件,可用的最高版本取决于所使用的Python版本.默认值还取决于Python版本.在Python 2中,默认值是协议版本0,但是在Python 3.8.1中,它是协议版本4.在Python 3.x中,该模块添加了pickle.DEFAULT_PROTOCOL,但是在Python 2中不存在.

Data stream formats (protocols)

pickle can read and write files in several different, Python-specific, formats, called protocols as described in the documentation, "Protocol version 0" is ASCII and therefore "human-readable". Versions > 0 are binary and the highest one available depends on what version of Python is being used. The default also depends on Python version. In Python 2 the default was Protocol version 0, but in Python 3.8.1, it's Protocol version 4. In Python 3.x the module had a pickle.DEFAULT_PROTOCOL added to it, but that doesn't exist in Python 2.

幸运的是,在每个调用中都有编写pickle.HIGHEST_PROTOCOL的简写(假设这就是您想要的,并且您通常会这样做),只需使用文字数字-1-类似于通过负索引引用序列的最后一个元素. 因此,与其写作:

Fortunately there's shorthand for writing pickle.HIGHEST_PROTOCOL in every call (assuming that's what you want, and you usually do), just use the literal number -1 — similar to referencing the last element of a sequence via a negative index. So, instead of writing:

pickle.dump(obj, output, pickle.HIGHEST_PROTOCOL)

您可以编写:

pickle.dump(obj, output, -1)

无论哪种方式,如果您创建了一个Pickler对象用于多个酸洗操作,则只需指定一次协议即可:

Either way, you'd only have specify the protocol once if you created a Pickler object for use in multiple pickle operations:

pickler = pickle.Pickler(output, -1)
pickler.dump(obj1)
pickler.dump(obj2)
   etc...

注意:如果您正在运行不同版本的Python的环境中,则可能需要显式使用(即,硬编码)一个特定的协议号,以供所有人阅读(以后).版本通常可以读取较早版本产生的文件.

Note: If you're in an environment running different versions of Python, then you'll probably want to explicitly use (i.e. hardcode) a specific protocol number that all of them can read (later versions can generally read files produced by earlier ones).

尽管腌制文件 可以包含任意数量的腌制对象(如上述示例所示),但是当它们的数量未知时,通常更容易以各种方式存储它们-大小的容器(例如listtupledict),并通过一次调用将它们全部写入文件:

While a pickle file can contain any number of pickled objects, as shown in the above samples, when there's an unknown number of them, it's often easier to store them all in some sort of variably-sized container, like a list, tuple, or dict and write them all to the file in a single call:

tech_companies = [
    Company('Apple', 114.18), Company('Google', 908.60), Company('Microsoft', 69.18)
]
save_object(tech_companies, 'tech_companies.pkl')

,然后使用以下方法恢复列表及其中的所有内容:

and restore the list and everything in it later with:

with open('tech_companies.pkl', 'rb') as input:
    tech_companies = pickle.load(input)

主要优点是您无需知道保存了多少个对象实例即可在以后加载它们(尽管在没有 信息的情况下这样做,它需要一些专门的代码).请参阅相关问题的答案在pickle文件中保存和加载多个对象? 了解有关执行此操作的不同方法的详细信息.我个人喜欢@Lutz Prechelt的答案最好.这是适合这里的示例的

The major advantage is you don't need to know how many object instances are saved in order to load them back later (although doing so without that information is possible, it requires some slightly specialized code). See the answers to the related question Saving and loading multiple objects in pickle file? for details on different ways to do this. Personally I like @Lutz Prechelt's answer the best. Here's it adapted to the examples here:

class Company:
    def __init__(self, name, value):
        self.name = name
        self.value = value

def pickled_items(filename):
    """ Unpickle a file of pickled data. """
    with open(filename, "rb") as f:
        while True:
            try:
                yield pickle.load(f)
            except EOFError:
                break

print('Companies in pickle file:')
for company in pickled_items('company_data.pkl'):
    print('  name: {}, value: {}'.format(company.name, company.value))

这篇关于保存对象(数据持久性)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆