如何在IPython.parallel中使用交互式定义的类? [英] How to work with interactively-defined classes in IPython.parallel?

查看:96
本文介绍了如何在IPython.parallel中使用交互式定义的类?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在连接到群集的笔记本上的交互式原型开发中,我想定义一个在客户端中都可用的类 __ main __ 会话并在集群引擎节点上以交互方式更新,以便能够通过将此类实例的参数传递给LoadBalanced视图来移动该类的实例。以下演示了典型的用户会话:

In an interactive prototyping development on the notebook connected to a cluster, I would like to define a class that is both available in the client __main__ session and interactively update on the cluster engine nodes to be able to move instances of that class around by passing such instances a argument to a LoadBalanced view. The following demonstrates the typical user session:

首先设置并行群集环境:

First setup the parallel clustering environment:

>>> from IPython.parallel import Client
>>> rc = Client()
>>> lview = rc.load_balanced_view()
>>> rc[:]
<DirectView [0, 1, 2]>

在笔记本单元格中,让我们定义我们交互式编辑的组件的代码片段:

In a notebook cell let's define the code snippet of the component we are interactively editing:

>>> class MyClass(object):
...     def __init__(self, parameter):
...         self.parameter = parameter
...
...     def update_something(self, some_data):
...         # do something smart here with some_data & internal state
...
...     def compute_something(self, other_data):
...         # do something smart here with other data & internal state
...         return something
...

在下一个单元格,让我们创建一个脚本来构建这个类的实例,然后使用集群环境的负载平衡视图来评估各种输入参数的组件:

In the next cell, let's create a script that builds instances of this class and then use the load balanced view of the cluster environment to evaluate our component on a wide range of input parameters:

>>> def process(obj, some_data, other_data):
...     obj.update_something(some_data)
...     return obj.compute_something(other_data)
...
>>> tasks = []
>>> some_instances = [MyClass(i) for i in range(10)]
>>> for obj in some_instances:
...    for some_data in data_source_1:
...         for other_data in data_source_2:
...             ar = lview.apply_async(process, obj, some_data, other_data)
...             tasks.append(ar)
...
>>> # wait for computation to end
>>> results = [ar.get() for ar in tasks] 



问题



这显然不起作用,因为负载平衡视图的引擎将无法取消作为第一个参数传递给进程函数的实例。进程函数定义本身成功传递,因为我假设 apply_async 执行字节码检测以腌制它(通过访问 .code 函数的属性)然后只为剩下的参数做一个简单的pickle。

Problem

That will obviously not work as the engines of the load balanced view will be unable to unpickle the instances passed as first argument to the process function. The process function definition itself is passed successfully as I assume that apply_async does bytecode instrospection to pickle it (by accessing the .code attribute of the function) and then just does a simple pickle for the remaining arguments.


  • 另一种解决方案是在持有该单元格的单元格上使用 %% px 单元格魔法类的定义 MyClass 。但是,这将阻止我在客户端脚本中构建也执行调度的类实例。我需要在没有 %% px 魔法的情况下将单元格内容复制并粘贴到其他单元格中(或者使用魔法执行两次单元格,而不使用魔法执行另一次单元格)当我仍然在迭代开发中编辑类的方法时,这是乏味的。评估设置。

  • One alternative solution would be to use the %%px cell magic on the cell holding the definition of the class MyClass. However that would prevent me to build the class instances in the client script that also do the scheduling. I would need to copy and paste the cell content in an other cell without the %%px magic (or execute the cell twice once with magic and another time without the magic) but this is tedious when I am still editing the methods of the class in an iterative development & evaluation setting.

另一种解决方案是将类定义嵌入进程函数中但是我发现这不实用,因为我想在我的笔记本中稍后在其他函数中重用该类定义。

An alternative solution would be to embed the class definition inside the process function but I find this not practical as I would like to reuse that class definition in other functions later in my notebook.

或者我可以停止使用类并且只能工作通过将第一个参数传递给 apply_async ,可以将函数传送到引擎。但是我不喜欢这样,因为我希望以面向对象的方式对我的代码进行原型化,以便以后从笔记本中提取并将结果类包含在面向对象的库中。笔记本会话用作协作原型工具,用于使用 http://nbviewer.ipython.org 发布商在开发人员之间交换想法。

Alternatively I could just stop using a class and only work with functions that can be shipped over to the engines by passing then as first argument to the apply_async. However I don't like that either as I would like to prototype my code in an object oriented way for later extraction from the notebook and including the resulting class in an object oriented library. The notebook session serving as a collaborative prototyping tool using for exchanging ideas between developers using the http://nbviewer.ipython.org publisher.

最后的选择是在python模块中将我的类写在文件系统上的文件中,并使用NFS将该文件发送到引擎PYTHONPATH。这有效但阻止我只在笔记本电脑环境中工作,这会破坏笔记本电脑中交互式原型制作的全部目的。

The final alternative would be to write my class in a python module on a file on the filesystem and ship that file to the engines PYTHONPATH using NFS for instance. That works but prevent me to work only in the notebook environment which defeats the whole purpose of interactive prototyping in the notebook.

所以基本上,有没有办法以交互方式定义一个类,然后将它的定义发送到引擎?

So basically, is there a way to define a class interactively and then ship its definition around to the engines?

应该可以使用<来腌制类定义客户端中的code> inspect.getsource 然后将源发送到引擎并使用 eval 内置但不幸的是源检查不起作用对于 DummyMod 内置模块中定义的类:

It should be possible to pickle a class definition using the inspect.getsource in the client then send the source to the engines and use the eval builtin but unfortunately source inspection does not work for classes defined inside the DummyMod built-in module:


TypeError :< IPython.core.interactiveshell.DummyMod对象位于0x10c2c4e50>是一个内置类

有没有办法检查类定义的字节码呢?

Is there a way to inspect the bytecode of a class definition instead?

或者是否可以使用 %% px magic,以便在客户端本地执行单元格的内容在每个引擎上?

Or is it possible to use the %%px magic so as to both execute the content of the cell locally on the client and on each engine?

推荐答案

感谢您提供详细的问题(并在Twitter上ping我)。

Thanks for the detailed question (and pinging me on Twitter).

首先,也许它应该被视为一个错误,你不能只是推动类,
,因为简单的解决方案应该是

First, maybe it should be considered a bug that you can't just push classes, because the simple solution should be

rc[:]['MyClass'] = MyClass

但是pickling交互式定义的类仅在引用('\ x80 \x02c__main__ \\\
MyClass \\\
q \ x01。'
)中生成,给出DummyMod AttributeError。
这可以在IPython的序列化中内部修复。

but pickling interactively defined classes results only in a reference ('\x80\x02c__main__\nMyClass\nq\x01.'), giving your DummyMod AttributeError. This can probably be fixed internally in IPython's serialization.

虽然是实际的工作解决方案。

On to an actual working solution, though.

将本地执行添加到 %% px 非常简单,只需:

Adding local execution to %%px is super easy, just:

def pxlocal(line, cell):
    ip = get_ipython()
    ip.run_cell_magic("px", line, cell)
    ip.run_cell(cell)
get_ipython().register_magic_function(pxlocal, "cell")

现在你有 %% pxlocal 除了在本地运行单元格外,运行 %% px 的魔法。

And now you have a %%pxlocal magic that runs %%px in addition to running the cell locally.

然后你所要做的就是:

%%pxlocal

class MyClass(object):
    # etc

在各地定义你的课程。
我将 - 本地标志添加到 %% px ,所以这个额外的步骤不是必要的。

to define your class everywhere. I will add a --local flag to %%px, so this extra step isn't necessary.

一个完整的,有效的示例笔记本

这篇关于如何在IPython.parallel中使用交互式定义的类?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆