泡菜类实例加上定义? [英] Pickle class instance plus definition?

查看:73
本文介绍了泡菜类实例加上定义?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我怀疑这是一个普遍的问题,但是我没有找到解决方案.我想要的非常简单,并且在技术上似乎可行:我有一个简单的python类,并且希望将其存储在磁盘,实例和定义的单个文件中. Pickle将存储数据,但不存储类定义.可能有人争辩说,类定义已存储在我的.py文件中,但是我不希望有单独的.py文件.我的目标是要有一个独立的文件,可以用一行代码弹出到我的命名空间中.

This is a problem which I suspect is common, but I haven't found a solution for it. What I want is quite simple, and seemingly technically feasible: I have a simple python class, and I want to store it on disc, instance and definition, in a single file. Pickle will store the data, but it doesn't store the class definition. One might argue that the class definition is already stored in my .py file, but I don't want a separate .py file; my goal is to have a self-contained single file that I could pop back into my namespace with a single line of code.

是的,我知道可以使用两个文件和两行代码,但是我希望在一个文件和一行代码中实现.原因是因为我经常遇到这种情况.我正在处理一些大型数据集,并在python中进行操作,然后必须将切片,切块和转换后的数据写回到某些预先存在的目录结构中.我不想用乱码的python类存根乱丢这些数据目录,以使我的代码和数据保持关联,而我想要的甚至更少的麻烦是跟踪和组织定义的所有这些即席小类在脚本中独立运行.

So yes, I know this possible using two files and two lines of code, but I want it in one file and one line of code. The reason why is because I often find myself in this situation; I'm working on some big dataset, manipulating it in python, and then having to write my sliced, diced and transformed data back into some preexisting directory structure. What I don't want is to litter these data-directories with ill-named python class stubs to keep my code and data associated, and what I want even less is the hassle of keeping track of and organizing all these little ad hoc classes defined on the fly in a script independently.

因此,便利性并不仅在于代码的可读性,还在于代码与数据之间毫不费力且毫无歧义的关联.对我来说,这似乎是一个值得追求的目标,尽管我知道这在大多数情况下都不适当.

So the convenience isn't so much in code readability, but in effortless and unfudgable association between code and data. That seems like a worthy goal to me, even though I understand it isn't appropriate in most situations.

所以问题是:是否存在执行此类操作的软件包或代码片段,因为我似乎找不到任何东西.

So the question is: Is there a package or code snippet that does such a thing, because I can't seem to find any.

推荐答案

如果使用dill,则可以将__main__视作一个python模块(大部分情况下).因此,您可以序列化交互式定义的类等. dill(默认情况下)也可以将类定义作为pickle的一部分进行传输.

If you use dill, it enables you to treat __main__ as if it were a python module (for the most part). Hence, you can serialize interactively defined classes, and the like. dill also (by default) can transport the class definition as part of the pickle.

>>> class MyTest(object):
...   def foo(self, x):
...     return self.x * x
...   x = 4
... 
>>> f = MyTest() 
>>> import dill
>>>
>>> with open('test.pkl', 'wb') as s:
...   dill.dump(f, s)
... 
>>> 

然后关闭解释器,并通过TCP发送文件test.pkl.现在,您可以在远程计算机上获取类实例.

Then shut down the interpreter, and send the file test.pkl over TCP. On your remote machine, now you can get the class instance.

Python 2.7.9 (default, Dec 11 2014, 01:21:43) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> with open('test.pkl', 'rb') as s:
...   f = dill.load(s)
... 
>>> f
<__main__.MyTest object at 0x1069348d0>
>>> f.x
4
>>> f.foo(2)
8
>>>             

但是如何获取类定义?因此,这并不是您想要的.但是,下面是.

But how to get the class definition? So this is not exactly what you wanted. The following is, however.

>>> class MyTest2(object):
...   def bar(self, x):
...     return x*x + self.x
...   x = 1
... 
>>> import dill
>>> with open('test2.pkl', 'wb') as s:
...   dill.dump(MyTest2, s)
... 
>>>

然后在发送文件后……您可以获得类定义.

Then after sending the file… you can get the class definition.

Python 2.7.9 (default, Dec 11 2014, 01:21:43) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> with open('test2.pkl', 'rb') as s:
...   MyTest2 = dill.load(s)
... 
>>> print dill.source.getsource(MyTest2)
class MyTest2(object):
  def bar(self, x):
    return x*x + self.x
  x = 1

>>> f = MyTest2()
>>> f.x
1
>>> f.bar(4)
17

由于您一直在寻找一个班轮,所以我可以做得更好.我没有显示您可以同时发送类和实例,也许正是您想要的.

Since you were looking for a one liner, I can do better. I didn't show you can send over the class and the instance at the same time, and maybe that's what you were wanting.

>>> import dill
>>> class Foo(object): 
...   def bar(self, x):
...     return x+self.x
...   x = 1
... 
>>> b = Foo()
>>> b.x = 5
>>> 
>>> with open('blah.pkl', 'wb') as s:
...   dill.dump((Foo, b), s)
... 
>>> 

它仍然不是一行,但是它可以工作.

It's still not a single line, however, it works.

Python 2.7.9 (default, Dec 11 2014, 01:21:43) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> with open('blah.pkl', 'rb') as s:
...   Foo, b = dill.load(s)
... 
>>> b.x  
5
>>> Foo.bar(b, 2)
7

因此,在dill中有一个dill.source,它具有可以检测函数和类的依赖关系的方法,并且可以将它们与pickle一起使用(大部分情况下).

So, within dill, there's dill.source, and that has methods that can detect dependencies of functions and classes, and take them along with the pickle (for the most part).

>>> def foo(x):
...   return x*x
... 
>>> class Bar(object):
...   def zap(self, x):
...     return foo(x) * self.x
...   x = 3
... 
>>> print dill.source.importable(Bar.zap, source=True)
def foo(x):
  return x*x
def zap(self, x):
  return foo(x) * self.x

因此,这并不是完美的"(或者可能不是预期的)……但是它确实对动态构建的方法及其依赖项的代码进行了序列化.您只是没有得到课程的其余部分-但是在这种情况下,不需要课程的其余部分.不过,这似乎还不是您想要的.

So that's not "perfect" (or maybe not what's expected)… but it does serialize the code for a dynamically built method and it's dependencies. You just don't get the rest of the class -- but the rest of the class is not needed in this case. Still, it doesn't seem like what you wanted.

如果您想获取所有内容,则可以对整个会话进行腌制. 并一行(两个计算import).

If you wanted to get everything, you could just pickle the entire session. And in one line (two counting the import).

>>> import dill
>>> def foo(x):
...   return x*x
... 
>>> class Blah(object):
...   def bar(self, x):
...     self.x = (lambda x:foo(x)+self.x)(x)
...   x = 2
... 
>>> b = Blah()
>>> b.x
2
>>> b.bar(3)
>>> b.x
11
>>> # the one line
>>> dill.dump_session('foo.pkl')
>>> 

然后在远程计算机上...

Then on the remote machine...

Python 2.7.9 (default, Dec 11 2014, 01:21:43) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> # the one line
>>> dill.load_session('foo.pkl')
>>> b.x
11
>>> b.bar(2)
>>> b.x
15
>>> foo(3)
9

最后,如果希望透明地完成"传输(而不是使用文件),则可以使用pathos.ppppft,它们提供了将对象运送到第二台python服务器的功能(在远程计算机上)或python进程.他们在引擎盖下使用dill,只是将代码通过电线传递.

Lastly, if you want the transport to be "done" for you transparently (instead of using a file), you could use pathos.pp or ppft, which provide the ability to ship objects to a second python server (on a remote machine) or python process. They use dill under the hood, and just pass the code across the wire.

>>> class More(object):
...   def squared(self, x):
...     return x*x
... 
>>> import pathos
>>> 
>>> p = pathos.pp.ParallelPythonPool(servers=('localhost,1234',))
>>> 
>>> m = More()
>>> p.map(m.squared, range(5))
[0, 1, 4, 9, 16]

servers参数是可选的,这里只是连接到端口1234上的本地计算机……但是,如果您改用(或同时)使用远程计算机名称和端口,则会触发到远程机器-毫不费力".

The servers argument is optional, and here is just connecting to the local machine on port 1234… but if you use the remote machine name and port instead (or as well), you'll fire off to the remote machine -- "effortlessly".

在此处获取dillpathosppft: https://github.com/uqfoundation

这篇关于泡菜类实例加上定义?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆