python是否自动将IO和CPU或内存绑定的段并行化? [英] Is python automagically parallelizing IO- and CPU- or memory-bound sections?

查看:108
本文介绍了python是否自动将IO和CPU或内存绑定的段并行化?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是关于上一个问题的后续问题.

考虑此代码,它比上一个问题(但仍然比我的实际问题简单得多)

Consider this code, which is less toyish than the one in the previous question (but still much simpler than my real one)

import sys
data=[]

for line in open(sys.argv[1]):
    data.append(line[-1])

print data[-1]

现在,我期望更长的运行时间(我的基准文件长65150224行),甚至可能更长.事实并非如此,它在大约2分钟内以与以前相同的速度运行!

Now, I was expecting a longer run time (my benchmark file is 65150224 lines long), possibly much longer. This was not the case, it runs in ~ 2 minutes on the same hw as before!

data.append()是否非常轻巧?我不这么认为,因此我编写了这个伪代码对其进行测试:

Is it data.append() very lightweight? I don't believe so, thus I wrote this fake code to test it:

data=[]
counter=0
string="a\n"

for counter in xrange(65150224):
    data.append(string[-1])

print data[-1]

运行时间为1.5到3分钟(运行之间存在很大差异)

This runs in 1.5 to 3 minutes (there is strong variability among runs)

为什么以前的程序不能显示3.5至5分钟?显然data.append()与IO并行发生.

Why don't I get 3.5 to 5 minutes in the former program? Obviously data.append() is happening in parallel with the IO.

这是个好消息!

但是它如何工作?它是有文件记录的功能吗?对我的代码有什么要求,我应该遵循以使其尽可能发挥作用(除了负载平衡IO和内存/CPU活动之外)?还是仅仅是普通的缓冲/缓存操作?

But how does it work? Is it a documented feature? Is there any requirement on my code that I should follow to make it works as much as possible (besides load-balancing IO and memory/CPU activities)? Or is it just plain buffering/caching in action?

同样,我将此问题标记为"linux",因为我只对特定于linux的答案感兴趣.如果您认为值得做的话,请随时提供与操作系统无关的答案,甚至与其他操作系统无关.

Again, I tagged "linux" this question, because I'm interested only in linux-specific answers. Feel free to give OS-agnostic, or even other-OS answers, if you think it's worth doing.

推荐答案

很显然data.append()与IO并行发生.

Obviously data.append() is happening in parallel with the IO.

恐怕不是. 可能会在Python中并行化IO和计算,但这并不是神奇的事情.

I'm afraid not. It is possible to parallelize IO and computation in Python, but it doesn't happen magically.

您可以做的一件事是使用posix_fadvise(2)给操作系统提示您计划顺序读取文件(POSIX_FADV_SEQUENTIAL).

One thing you could do is use posix_fadvise(2) to give the OS a hint that you plan to read the file sequentially (POSIX_FADV_SEQUENTIAL).

在一些对600兆文件(ISO)执行"wc -l"的粗略测试中,性能提高了约20%.清除磁盘缓存后,立即进行每个测试.

In some rough tests doing "wc -l" on a 600 meg file (an ISO) the performance increased by about 20%. Each test was done immediately after clearing the disk cache.

有关fadvise的Python界面,请参见 python-fadvise .

For a Python interface to fadvise see python-fadvise.

这篇关于python是否自动将IO和CPU或内存绑定的段并行化?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆