使用pickle或dill序列化__main__中的对象 [英] Serializing an object in __main__ with pickle or dill

查看:350
本文介绍了使用pickle或dill序列化__main__中的对象的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有酸洗问题.我想在主脚本中序列化一个函数,然后加载它并在另一个脚本中运行它.为了演示这一点,我制作了2个脚本:

I have a pickling problem. I want to serialize a function in my main script, then load it and run it in another script. To demonstrate this, I've made 2 scripts:

dill_pickle_script_1.py

import pickle
import time

def my_func(a, b):
    time.sleep(0.1)  # The purpose of this will become evident at the end
    return a+b

if __name__ == '__main__':
    with open('testfile.pkl', 'wb') as f:
        pickle.dump(my_func, f)

dill_pickle_script_2.py

import pickle

if __name__ == '__main__':
    with open('testfile.pkl') as f:
        func = pickle.load(f)
        assert func(1, 2)==3

问题:运行脚本2时得到AttributeError: 'module' object has no attribute 'my_func'.我了解原因:因为在脚本1中对my_func进行序列化时,它属于__main__模块. dill_pickle_script_2不知道__main__那里引用了dill_pickle_script_1的命名空间,因此找不到引用.

Problem: when I run script 2, I get AttributeError: 'module' object has no attribute 'my_func'. I understand why: because when my_func is serialized in script1, it belongs to the __main__ module. dill_pickle_script_2 can't know that __main__ there referred to the namespace of dill_pickle_script_1, and therefore cannot find the reference.

我通过添加一些技巧来解决此问题-在腌制之前,我在dill_pickle_script_1中的my_func中添加了绝对导入.

I fix the problem by adding a little hack - I add an absolute import to my_func in dill_pickle_script_1 before pickling it.

dill_pickle_script_1.py

import pickle
import time

def my_func(a, b):
    time.sleep(0.1)
    return a+b

if __name__ == '__main__':
    from dill_pickle_script_1 import my_func  # Added absolute import
    with open('testfile.pkl', 'wb') as f:
        pickle.dump(my_func, f)

现在可以使用了!但是,我想避免每次想这样做时都必须进行此破解. (此外,我想在其他模块中进行酸洗,而这些模块将不知道my_func来自哪个模块).

Now it works! However, I'd like to avoid having to do this hack every time I want to do this. (Also, I want to have my pickling be done inside some other module which wouldn't have know which module that my_func came from).

我想说,包 dill 可让您序列化main中的内容并将其加载到其他位置.所以我尝试了:

I head that the package dill lets you serialize things in main and load them elsewhere. So I tried that:

dill_pickle_script_1.py

import dill
import time

def my_func(a, b):
    time.sleep(0.1)
    return a+b

if __name__ == '__main__':
    with open('testfile.pkl', 'wb') as f:
        dill.dump(my_func, f)

dill_pickle_script_2.py

import dill

if __name__ == '__main__':
    with open('testfile.pkl') as f:
        func = dill.load(f)
        assert func(1, 2)==3

但是,现在,我还有另一个问题:运行dill_pickle_script_2.py时,我得到了NameError: global name 'time' is not defined.似乎莳萝没有意识到my_func引用了time模块,必须在加载时将其导入.

Now, however, I have another problem: When running dill_pickle_script_2.py, I get a NameError: global name 'time' is not defined. It seems that dill did not realize that my_func referenced the time module and has to import it on load.

我如何在main中序列化一个对象,然后在另一个脚本中再次加载它,以便也加载该对象使用的所有导入,而无需在Attempt 2中进行讨厌的小改动?

How can I serialize an object in main, and load it again in another script so that all the imports used by that object are also loaded, without doing the nasty little hack in Attempt 2?

推荐答案

好吧,我找到了解决方案.这是一个可怕但整洁的冲突,不能保证在所有情况下都能正常工作.欢迎提出任何改进建议.解决方案包括使用以下辅助函数将主引用替换为pickle字符串中的绝对模块引用:

Well, I found a solution. It is a horrible but tidy kludge and not guaranteed to work in all cases. Any suggestions for improvement are welcome. The solution involves replacing the main reference with an absolute module reference in the pickle string, using the following helper functions:

import sys
import os

def pickle_dumps_without_main_refs(obj):
    """
    Yeah this is horrible, but it allows you to pickle an object in the main module so that it can be reloaded in another
    module.
    :param obj:
    :return:
    """
    currently_run_file = sys.argv[0]
    module_path = file_path_to_absolute_module(currently_run_file)
    pickle_str = pickle.dumps(obj, protocol=0)
    pickle_str = pickle_str.replace('__main__', module_path)  # Hack!
    return pickle_str


def pickle_dump_without_main_refs(obj, file_obj):
    string = pickle_dumps_without_main_refs(obj)
    file_obj.write(string)


def file_path_to_absolute_module(file_path):
    """
    Given a file path, return an import path.
    :param file_path: A file path.
    :return:
    """
    assert os.path.exists(file_path)
    file_loc, ext = os.path.splitext(file_path)
    assert ext in ('.py', '.pyc')
    directory, module = os.path.split(file_loc)
    module_path = [module]
    while True:
        if os.path.exists(os.path.join(directory, '__init__.py')):
            directory, package = os.path.split(directory)
            module_path.append(package)
        else:
            break
    path = '.'.join(module_path[::-1])
    return path

现在,我可以简单地将dill_pickle_script_1.py更改为

Now, I can simply change dill_pickle_script_1.py to say

import time
from artemis.remote.child_processes import pickle_dump_without_main_refs


def my_func(a, b):
    time.sleep(0.1)
    return a+b

if __name__ == '__main__':
    with open('testfile.pkl', 'wb') as f:
        pickle_dump_without_main_refs(my_func, f)

然后dill_pickle_script_2.py起作用!

这篇关于使用pickle或dill序列化__main__中的对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆