序列化带有依赖项的python函数 [英] Serialize a python function with dependencies

查看:95
本文介绍了序列化带有依赖项的python函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经尝试了多种方法来腌制具有依赖关系的python函数,遵循有关StackOverflow的许多建议(例如莳萝,cloudpickle等),但所有方法似乎都遇到了一个我无法弄清的基本问题.

I have tried multiple approaches to pickle a python function with dependencies, following many recommendations on StackOverflow, (such as dill, cloudpickle, etc.) but all seem to run into a fundamental issue that I cannot figure out.

我有一个主模块,该模块试图从导入的模块中选取一个函数,然后通过ssh发送该函数以使其不被选取,并在远程计算机上执行.

I have a main module that tries to pickle a function from an imported module, sends it over ssh to be unpickled and executed at a remote machine.

所以主要有:

    import dill (for example)
    import modulea

    serial=dill.dumps( modulea.func )
    send (serial)

在远程计算机上:

        import dill
        receive serial
        funcremote = dill.loads( serial )
        funcremote()

如果要腌制和发送的功能是在main自身中定义的顶级功能,则一切正常.当它们在导入的模块中时,装入功能将失败,并显示找不到模块modulea"类型的消息.

If the functions being pickled and sent are top level functions defined in main itself, everything works. When they are in an imported module, the loads function fails with messages of the type "module modulea not found".

似乎模块名称和功能名称一起被酸洗.我看不到有任何方法可以修复"泡菜以消除依赖关系,或者在接收器中创建一个虚拟模块以成为解酸的接收器.

It appears that the module name is pickled along with the function name. I do not see any way to "fix up" the pickle to remove the dependency, or alternately, to create a dummy module in the receiver to become the recipient of the unpickling.

任何指针将不胜感激.

-prasanna

推荐答案

我是dill的作者.我在ssh上做了确切的事情,但是成功了.当前,dill和任何其他序列化程序通过引用来腌制模块……因此,要成功传递文件中定义的功能,必须确保在另一台计算机上也安装了相关模块.我不相信有任何对象序列化程序可以直接序列化模块(即不通过引用).

I'm the dill author. I do this exact thing over ssh, but with success. Currently, dill and any of the other serializers pickle modules by reference… so to successfully pass a function defined in a file, you have to ensure that the relevant module is also installed on the other machine. I do not believe there is any object serializer that serializes modules directly (i.e. not by reference).

话虽如此,dill确实具有一些序列化对象依赖项的选项.例如,对于类实例,dill中的默认设置是不通过引用序列化类实例……因此,类定义也可以被序列化并随实例一起发送.在dill中,您也可以(使用一项非常新的功能)通过序列化文件而不是通过引用来序列化文件句柄.但是,如果在模块中定义了函数的情况,那么运气就不好了,因为模块通过引用被普遍地序列化了.

Having said that, dill does have some options to serialize object dependencies. For example, for class instances, the default in dill is to not serialize class instances by reference… so the class definition can also be serialized and send with the instance. In dill, you can also (use a very new feature to) serialize file handles by serializing the file, instead of the doing so by reference. But again, if you have the case of a function defined in a module, you are out-of-luck, as modules are serialized by reference pretty darn universally.

您也许可以使用dill来执行此操作,但不仅可以腌制对象,还可以提取源代码并发送源代码.在pathos.pppyina中,我们使用dill提取任何对象(包括函数)的依赖项,并将它们传递给另一台计算机/进程/等但是,由于这不是一件容易的事,因此dill还可以使用故障转移来尝试提取相关的导入并将其发送而不是源代码.

You might be able to use dill to do so, however, just not with pickling the object, but with extracting the source and sending the source code. In pathos.pp and pyina, dill us used to extract the source and the dependencies of any object (including functions), and pass them to another computer/process/etc. However, since this is not an easy thing to do, dill can also use the failover of trying to extract a relevant import and send that instead of the source code.

希望您可以理解,这是一件很麻烦的事情(正如我在下面提取的函数的依赖项之一中所述).但是,您要问的是在pathos包中成功完成的,可以通过ssh隧道端口将代码和依赖项传递给不同的计算机.

You can understand, hopefully, this is a messy messy thing to do (as noted in one of the dependencies of the function I am extracting below). However, what you are asking is successfully done in the pathos package to pass code and dependencies to different machines across ssh-tunneled ports.

>>> import dill
>>> 
>>> print dill.source.importable(dill.source.importable)
from dill.source import importable
>>> print dill.source.importable(dill.source.importable, source=True)
def _closuredsource(func, alias=''):
    """get source code for closured objects; return a dict of 'name'
    and 'code blocks'"""
    #FIXME: this entire function is a messy messy HACK
    #      - pollutes global namespace
    #      - fails if name of freevars are reused
    #      - can unnecessarily duplicate function code
    from dill.detect import freevars
    free_vars = freevars(func)
    func_vars = {}
    # split into 'funcs' and 'non-funcs'
    for name,obj in list(free_vars.items()):
        if not isfunction(obj):
            # get source for 'non-funcs'
            free_vars[name] = getsource(obj, force=True, alias=name)
            continue
        # get source for 'funcs'

#…snip… …snip… …snip… …snip… …snip… 

            # get source code of objects referred to by obj in global scope
            from dill.detect import globalvars
            obj = globalvars(obj) #XXX: don't worry about alias?
            obj = list(getsource(_obj,name,force=True) for (name,_obj) in obj.items())
            obj = '\n'.join(obj) if obj else ''
            # combine all referred-to source (global then enclosing)
            if not obj: return src
            if not src: return obj
            return obj + src
        except:
            if tried_import: raise
            tried_source = True
            source = not source
    # should never get here
    return

我想也可以围绕dill.detect.parents方法构建一些东西,该方法为任何给定对象提供了指向所有父对象的指针列表……并且可以将任何函数的所有依赖项都重构为对象……但未实现.

I imagine something could also be built around the dill.detect.parents method, which provides a list of pointers to all parent object for any given object… and one could reconstruct all of any function's dependencies as objects… but this is not implemented.

顺便说一句:要建立ssh隧道,只需执行以下操作:

BTW: to establish a ssh tunnel, just do this:

>>> t = pathos.Tunnel.Tunnel()
>>> t.connect('login.university.edu')
39322
>>> t  
Tunnel('-q -N -L39322:login.university.edu:45075 login.university.edu')

然后,您可以使用ZMQssh或任何其他方式在本地端口上工作.如果要使用ssh进行此操作,则pathos也是内置的.

Then you can work across the local port with ZMQ, or ssh, or whatever. If you want to do so with ssh, pathos also has that built in.

这篇关于序列化带有依赖项的python函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆