线程IPython笔记本的每单元输出 [英] Per-cell output for threaded IPython Notebooks

查看:80
本文介绍了线程IPython笔记本的每单元输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不想提出这个问题,因为对于什么是相当神奇的工具来说,这似乎是一个完全不合理的功能要求。但是,如果有任何读者碰巧熟悉架构,我有兴趣知道潜在的扩展是否可行。



我最近写了一个带有一些简单线程代码的笔记本在其中,只是为了看看我跑的时候会发生什么。可以在 https://gist.github上找到笔记本代码(tl;它启动许多在睡眠循环中打印的并行线程)。 com / 4562840



通过在代码运行时按几次SHIFT-RETURN,您可以观察到内核的任何输出都出现在输出区域中。当前单元格,而不是运行代码的单元格。



我想知道如果线程对单元格有效,是否可以显示刷新按钮,允许输出区域异步更新。理想情况下,如果在所有线程结束后(最终更新后)点击它,刷新按钮就会消失。



但这取决于能否识别和拦截每个线程的打印输出,并将其指向特定单元格输出的缓冲区。所以,有两个问题。


  1. 我认为Python 2的打印声明的硬连接意味着
    使用标准解释器无法实现此增强功能吗?


  2. Python 3的前景是否更好,因为它可以潜行另一个
    进入IPython内核中的print()堆栈?特别是那些没有按照Python链接到达这里的人,


  3. [没人希望西班牙宗教裁判所]
    更一般地说,你能指出(语言无关的)例子
    多个流被传递到一个页面?是否有任何已建立的最佳实践
    用于构建和修改DOM来处理这个问题?



解决方案

更新:


我相信Python 2的打印声明的硬连接意味着这个增强无法使用标准解释器实现?


不,print语句的重要部分根本没有硬连线。 print只是写入sys.stdout,它可以是 write flush 方法的任何对象。 IPython已经完全替换了这个对象,以便首先将stdout输入到笔记本中(见下文)。


Python的前景3更好,因为可以将另一层潜入IPython内核中的print()堆栈?特别是那些没有按照Python链接到达的人,




不 - 覆盖sys.stdout就是你所需要的,而不是自己打印(参见上面,下面和其他地方)。
这里的Python 3没有任何优势。


[没人希望西班牙宗教裁判所]更一般地说,你能指出(语言无关)多个流被传递到页面的例子?


当然 - IPython笔记本本身。它使用消息ID和元数据来确定stdout消息的来源,
以及这些消息最终应该在哪里。
下面,在我原来对一个显然没有人问过的问题的答案中,我展示了一个同时绘制来自多个单元格的输出的示例,这些单元格的线程同时运行。



为了获得您想要的刷新行为,您可能需要做两件事:


  1. 用您自己的对象替换sys.stdout它使用IPython显示协议发送带有您自己的线程标识元数据的消息(例如 threading.current_thread()。ident )。这应该在上下文管理器中完成(如下所示),因此它只影响你真正想要它的打印语句。

  2. 编写一个IPython js插件来处理你的新格式的stdout消息,以便它们不会立即绘制,而是存储在数组中,等待绘制。

原始答案(错误,但相关问题):



它依赖于一些恶作剧和私有API,但这对于当前的IPython来说是完全可能的(它可能不会永远存在)。



以下是一个示例笔记本: http://nbviewer.ipython.org/4563193



为了做到这一点,你需要了解IPython如何首先将stdout输入到笔记本中。
这是通过用 OutStream 对象替换sys.stdout来完成的。
这会缓冲数据,然后在调用 sys.stdout.flush 时,通过zeromq发送它,
并最终在浏览器中结束。现在,如何将输出发送到特定的单元格。



IPython 消息协议
使用父标题来标识哪个请求产生了哪个回复。
每当你要求IPython运行一些代码时,它会设置各种对象(包括sys.stdout)的父头,
,以便它们的副作用消息与导致它们的消息相关联。
当你在一个线程中运行代码时,这意味着当前的parent_header只是最新的execute_request,
而不是启动任何给定线程的原始执行。



考虑到这一点,这里有一个上下文管理器,它暂时将stdout的父标题设置为特定值:

 从contextlib import contextmanager导入sys 



stdout_lock = threading.Lock()

@contextmanager
def set_stdout_parent(parent):
用于为sys.stdout设置特定父级的上下文管理器

父级确定输出的目标单元格

save_parent = sys.stdout .parent_header

#我们需要一个锁,这样其他线程就不会抢夺控制
#而我们用stdout_lock设置了一个临时父

sys。 stdout.parent_header = parent
try:
最终收益

#the flush很重要,因为那时parent_header实际上有效果
sys.stdout.flush()
sys.stdout.parent_header = save_parent

这是一个线程,它在线程启动时记录父项,
并在每次生成一个print语句时应用该父项,
所以它表现得好像它仍然在原始单元格中:

  import threading 

class counterThread(threading。线程):
def run(self):
#线程启动时记录父项
thread_parent = sys.stdout.parent_header
for i in range(3):
time.sleep(2)
#然后确保父线程与线程启动时相同
#每次使用set_stdout_parent(thread_parent)打印
时:
print i

最后,笔记本将它们捆绑在一起,
with时间戳显示实际并发打印到多个单元格:



http://nbviewer.ipython .org / 4563193 /


I don't want to raise this as an issue, because it seems like a completely unreasonable feature request for what is a fairly amazing tool. But if any readers happen to be familiar with the architecture I'd be interested to know if a potential extension seems feasible.

I recently wrote a notebook with some simple threaded code in it, just to see what would happen when I ran it. The notebook code (tl;dr it starts a number of parallel threads that print in a sleep loop) is available at https://gist.github.com/4562840.

By hitting SHIFT-RETURN a few times as the code runs you can observe that any output from the kernel appears in the output area of the current cell, not that of the cell in which the code was run.

I was wondering if it would be possible, if threads were active for a cell, to display a "refresh" button allowing the output area to be updated asynchronously. Ideally the refresh button would disappear if it was clicked after all threads had ended (after a final update).

This would depend, though, on being able to identify and intercept the print output for each thread and direct it to a buffer for the specific cell's output. So, two questions.

  1. Am I correct in believing the the hard-wiring of Python 2's print statement means that this enhancement can not be implemented with a standard interpreter?

  2. Are the prospects for Python 3 any better, given that it's possible to sneak another layer into the print() stack inside the IPython kernel?and especially for those who didn't follow a Python link to get here,

  3. [nobody expects the Spanish Inquisition] More generally, can you point to (language-agnostic) examples of multiple streams being delivered into a page? Are there any established best practices for constructing and modifying the DOM to handle this?

解决方案

UPDATE:

Am I correct in believing the the hard-wiring of Python 2's print statement means that this enhancement can not be implemented with a standard interpreter?

No, the important parts of the print statement are not hardwired at all. print simply writes to sys.stdout, which can be any object with write and flush methods. IPython already completely replaces this object in order to get stdout to the notebook in the first place (see below).

Are the prospects for Python 3 any better, given that it's possible to sneak another layer into the print() stack inside the IPython kernel?and especially for those who didn't follow a Python link to get here,

Nope - overriding sys.stdout is all you need, not print itself (see above, below, and elsewhere). There are no advantages to Python 3 here.

[nobody expects the Spanish Inquisition] More generally, can you point to (language-agnostic) examples of multiple streams being delivered into a page?

Sure - the IPython notebook itself. It uses message IDs and metadata to determine the origin of stdout messages, and in turn where those messages should end up. Below, in my original answer to a question that apparently nobody asked, I show an example of simultaneously drawing output coming from multiple cells whose threads are running concurrently.

In order to get the refresh behavior you desire, you would probably need to do two things:

  1. replace sys.stdout with your own object that uses the IPython display protocol to send messages with your own thread-identifying metadata (e.g. threading.current_thread().ident). This should be done in a context manager (as below), so it only affects the print statements you actually want it to.
  2. write an IPython js plugin for handling your new format of stdout messages, so that they are not drawn immediately, but rather stored in arrays, waiting to be drawn.

Original answer (wrong, but related question):

It relies on some shenanigans, and private APIs, but this is totally possible with current IPython (it may not be forever).

Here is an example notebook: http://nbviewer.ipython.org/4563193

In order to do this, you need to understand how IPython gets stdout to the notebook in the first place. This is done by replacing sys.stdout with an OutStream object. This buffers data, and then sends it over zeromq when sys.stdout.flush is called, and it ultimately ends up in the browser.

Now, how to send output to a particular cell.

The IPython message protocol uses a 'parent' header to identify which request produced which reply. Every time you ask IPython to run some code, it sets the parent header of various objects (sys.stdout included), so that their side effect messages are associated with the message that caused them. When you run code in a thread, that means that the current parent_header is just the most recent execute_request, rather than the original one that started any given thread.

With that in mind, here is a context manager that temporarily sets stdout's parent header to a particular value:

import sys
from contextlib import contextmanager


stdout_lock = threading.Lock()

@contextmanager
def set_stdout_parent(parent):
    """a context manager for setting a particular parent for sys.stdout

    the parent determines the destination cell of output
    """
    save_parent = sys.stdout.parent_header

    # we need a lock, so that other threads don't snatch control
    # while we have set a temporary parent
    with stdout_lock:
        sys.stdout.parent_header = parent
        try:
            yield
        finally:
            # the flush is important, because that's when the parent_header actually has its effect
            sys.stdout.flush()
            sys.stdout.parent_header = save_parent

And here is a Thread that records the parent when the thread starts, and applies that parent each time it makes a print statement, so it behaves as if it were still in the original cell:

import threading

class counterThread(threading.Thread):
    def run(self):
        # record the parent when the thread starts
        thread_parent = sys.stdout.parent_header
        for i in range(3):
            time.sleep(2)
            # then ensure that the parent is the same as when the thread started
            # every time we print
            with set_stdout_parent(thread_parent):
                print i

And finally, a notebook tying it all together, with timestamps showing actual concurrent printing to multiple cells:

http://nbviewer.ipython.org/4563193/

这篇关于线程IPython笔记本的每单元输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆