在(i)python脚本中从jupyter内核获取输出 [英] Getting output from jupyter kernel in (i)python script

查看:116
本文介绍了在(i)python脚本中从jupyter内核获取输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在一个ipython会话中打开几个内核,在这些内核上运行代码,然后收集结果。但我无法弄清楚如何收集结果,甚至看不到stdout / stderr。我怎么能做这些事情?

I'd like to open several kernels from within a single ipython session, run code on those kernels, and then collect the results. But I can't figure out how to collect the results, or even see stdout/stderr. How can I do these things?

我管理了第一个两个步骤(打开内核并在其上运行代码),代码如下:

I've managed the first two steps (open kernels and run code on them) with code like the following:

from jupyter_client import MultiKernelManager
kernelmanager = MultiKernelManager()
remote_id = kernelmanager.start_kernel('python3')
remote_kernel = kernelmanager.get_kernel(remote_id)
remote = remote_kernel.client()
sent_msg_id = remote.execute('2+2')

[我欢迎任何有关如何改进或如何改进的建议关闭这些内核和客户端。]

[I welcome any suggestions for how to improve that, or for how to close these kernels and clients.]

这里, python3 可以是我拥有的任何内核的名称设置(可以在命令行中使用 jupyter-kernelspec list 列出)。我似乎能够运行任何合理的代码来代替'2 + 2'。例如,我可以写入文件,并且该文件确实已创建。

Here, python3 can be the name of any of the kernels I have set up (which can be listed at the command line with jupyter-kernelspec list). And I seem to be able to run any reasonable code in place of '2+2'. For example, I can write to a file, and that file really gets created.

现在,问题是如何获得结果。我可以得到一些看似相关的消息

Now, the problem is how to get the result. I can get some message that's seemingly related as

reply = remote.get_shell_msg(sent_msg_id)

该回复是这样的字典:

{'buffers': [],
 'content': {'execution_count': 2,
  'payload': [],
  'status': 'ok',
  'user_expressions': {}},
 'header': {'date': datetime.datetime(2015, 10, 19, 14, 34, 34, 378577),
  'msg_id': '98e216b4-3251-4085-8eb1-bfceedbae3b0',
  'msg_type': 'execute_reply',
  'session': 'ca4d615d-82b7-487f-88ff-7076c2bdd109',
  'username': 'me',
  'version': '5.0'},
 'metadata': {'dependencies_met': True,
  'engine': '868de9dd-054b-4630-99b7-0face61915a6',
  'started': '2015-10-19T14:34:34.265718',
  'status': 'ok'},
 'msg_id': '98e216b4-3251-4085-8eb1-bfceedbae3b0',
 'msg_type': 'execute_reply',
 'parent_header': {'date': datetime.datetime(2015, 10, 19, 14, 34, 34, 264508),
  'msg_id': '2674c61a-c79a-48a6-b88a-1f2e8da68a80',
  'msg_type': 'execute_request',
  'session': '767ae562-38d6-41a3-a9dc-6faf37d83222',
  'username': 'me',
  'version': '5.0'}}

这记录在 Jupyter中的消息。没有记录的是如何实际使用它 - 即,我使用哪些功能,何时何地找到消息等等。我见过这个问题及其答案,其中包含有用的相关信息,但并不能让我得到答案。并且此答案也没有获得任何有用的输出。

This is documented in Messaging in Jupyter. What isn't documented is how to actually use this -- i.e., which functions do I use, when and where do I find messages, etc. I've seen this question and its answer, which has useful related information, but doesn't quite get me to the answer. And this answer doesn't get any useful output, either.

所以,例如,我试图在上面的结果中给出带有 msg_id 的msg,但它只是挂起。我已经尝试了我能想到的一切,但无法弄清楚如何从内核中获取任何东西。我该怎么做?我可以用某种字符串从内核传回数据吗?我可以看到它的stdout和stderr吗?

So, for example, I've tried to also get the msg with the msg_id given in the result above, but it just hangs. I've tried everything I can think of, but can't figure out how to get anything back from the kernel. How do I do it? Can I transfer data back from the kernel in some sort of string? Can I see its stdout and stderr?

我正在写一个ipython魔术来运行代码远程内核上的代码段。 我的想法是我会有一个笔记本在我的笔记本电脑上,只需要一个像这样的小魔术单元从几个远程服务器收集数据:

I'm writing an ipython magic to run a code snippet on remote kernels. The idea is that I'll have a notebook on my laptop, and gather data from several remote servers by just having a little magic cell like this:

%%remote_exec -kernels server1,server2
2+2
! hostname

我使用 remote_ikernel 可以轻松自动地连接到这些远程内核。这似乎工作得很好;我得到了我的魔法命令,它的所有铃声和口哨声都很好,打开了这些远程内核,然后运行代码。现在我想从远程发送回我的笔记本电脑的一些数据 - 大概是通过某种方式序列化它。目前,我认为 pickle.dumps pickle.loads 对于这部分来说是完美的;我只需要将这些函数从一个内核创建并使用到另一个内核。我宁愿不使用实际文件进行酸洗,但这可能是可以接受的。

I use remote_ikernel to connect to those remote kernels easily and automatically. That seems to work just fine; I've got my magic command with all its bells and whistles working great, opening up these remote kernels, and running the code. Now I want to get some of that data from the remote sent back to my laptop -- presumably by serializing it in some way. At the moment, I think pickle.dumps and pickle.loads would be perfect for this part; I just have to get those bytes created and used by these functions from one kernel to the other. I'd rather not use actual files for the pickling, though this would be potentially be acceptable.

看起来有可能出现这样的怪异:

It looks like it's possible with some monstrosity like this:

remote.get_shell_msg(remote.execute('import pickle'))
sent_msg_id = remote.execute('a=2+2', user_expressions={'output':'pickle.dumps({"a":a})'})
reply = remote.get_shell_msg(sent_msg_id)
output_bytes = reply['content']['user_expressions']['output']['data']['text/plain']
variable_dict = pickle.loads(eval(output_bytes))

现在, variable_dict ['a'] 只是 4 。但请注意, output_bytes 是表示这些字节的字符串,因此必须为 eval ed。这看起来很荒谬(但仍然没有显示我是如何获得标准输出的)。有没有更好的办法?我如何获得stdout?

And now, variable_dict['a'] is just 4. Note, however, that output_bytes is a string representing those bytes, so it has to be evaled. This seems ridiculous (and still doesn't show how I'd get stdout). Is there a better way? And how do I get stdout?

虽然我对上面的黑客不满意,我已成功使用它来编写一个名为 remote_exec <的小模块/ a>托管在github上,如上所述。该模块给了我一些ipython魔术,我可以用来在一个或多个其他内核上远程运行代码。这是一个或多或少的自动过程,我绝对满意 - 除了对下面发生的事情的唠叨知识。

Though I'm unhappy with my hack above, I have successfully used it to write a little module called remote_exec hosted on github, as described above. The module gives me a little ipython magic that I can use to run code remotely on one or more other kernels. This is a more-or-less automatic process that I'm definitely satisfied with -- except for the nagging knowledge of what's happening underneath.

推荐答案

在我的问题中,我可能不太清楚,但我的主要用例是在多个远程计算机上运行一些代码(使用大规模并行代码计算数据的集群),这样我就可以在大型数据集上运行相当简单的命令远程存储,配置最少。为此, ipyparallel 不起作用。我基本上必须重写代码才能使用它。相反,我的模块 remote_exec 是完美的,允许我只需添加集群的名称和工作目录,但使用与本地使用的完全相同的代码。

I may not have been sufficiently clear in my question, but my primary use case is to run some code on multiple remote machines (clusters that compute data with massively parallel code) so that I can run fairly simple commands over large datasets stored remotely, with minimal configuration. For this purpose, ipyparallel does not work. I basically have to rewrite code to use it. Instead, my module remote_exec is perfect, allowing me to simply add the cluster's name and working directory, but otherwise use exactly the same code that I would use locally.

这篇关于在(i)python脚本中从jupyter内核获取输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆