如果作为Python子进程调用,则多线程Perl脚本会导致管道中断 [英] Multithreaded Perl script leads to broken pipe if called as a Python subprocess
问题描述
我正在从Python 3.7.3中调用带有子进程的Perl脚本.调用的Perl脚本就是这样的:
I am calling a Perl script from Python 3.7.3, with subprocess. The Perl script that is called is this one:
https://github.com/moses -smt/mosesdecoder/blob/master/scripts/tokenizer/tokenizer.perl
我用来调用它的代码是:
And the code I am using to call it is:
import sys
import os
import subprocess
import threading
def copy_out(source, dest):
for line in source:
dest.write(line)
num_threads=4
args = ["perl", "tokenizer.perl",
"-l", "en",
"-threads", str(num_threads)
]
with open(os.devnull, "wb") as devnull:
tokenizer = subprocess.Popen(args,
stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=devnull)
tokenizer_thread = threading.Thread(target=copy_out, args=(tokenizer.stdout, open("outfile", "wb")))
tokenizer_thread.start()
num_lines = 100000
for _ in range(num_lines):
tokenizer.stdin.write(b'Random line.\n')
tokenizer.stdin.close()
tokenizer_thread.join()
tokenizer.wait()
在我的系统上,这导致以下错误:
On my system, this leads to the following error:
Traceback (most recent call last):
File "t.py", line 27, in <module>
tokenizer.stdin.write(b'Random line.\n')
BrokenPipeError: [Errno 32] Broken pipe
我对此进行了调查,结果发现,如果子进程的-threads
参数为1,则不会引发错误.因为我不想放弃子进程中的多线程,所以我的问题是:
I investigated this, and it turns out that if the -threads
argument for the subprocess is 1 the error is not thrown. As I don't want to give up on multithreading in the child process, my question is:
首先导致此错误的原因是什么? 谁"应该归咎于谁:操作系统/环境,我的Python代码,Perl代码?
很高兴在需要时提供更多信息.
I am glad to provide more information if needed.
编辑:要回复一些评论,
- 仅当您还具有以下文件时,才可以运行Perl脚本: https://github.com/moses-smt/mosesdecoder/blob/master/scripts/share/nonbreaking_prefixes/nonbreaking_prefix.en
- 在过程失败之前,Perl脚本实际上处理了数千行.在上面的Python脚本中,如果将
num_lines
变小,则不会再出现此错误. - 如果我仅在命令行中调用此Perl脚本,而没有任何Python,它就可以正常工作:
无论有多少(Perl)线程或输入行. - 我的Python变量
num_threads
仅控制Perl子进程的线程数.我从不启动多个Python线程,只有一个.
- Running the Perl script is only possible if you also have this file: https://github.com/moses-smt/mosesdecoder/blob/master/scripts/share/nonbreaking_prefixes/nonbreaking_prefix.en
- The Perl script actually processes several thousands of lines before the process fails. In my Python script above, if I make
num_lines
smaller, I do not get this error anymore. - If I invoke this Perl script simply on the command line, without any Python, it works fine:
no matter how many (Perl) threadsor lines of input. - My Python variable
num_threads
only controls the number of threads of the Perl subprocess. I never start several Python threads, just one.
EDIT 2 :在我的第一次编辑中,我错误地指出,当使用例如从命令行-threads 4
:在那里,使用了一个不同的Perl,该Perl是通过多线程编译的.如果我使用从Python调用的相同的Perl,则会得到:
EDIT 2: In my first edit, I incorrectly stated that this Perl program runs fine when called with e.g. -threads 4
from the command line: there, a different Perl was used that is compiled with multithreading. If I use the same Perl that is invoked from Python, I get:
$ cat [file with 100000 lines] | [correct perl] tokenizer.perl -l en -threads 4
Can't locate object method "new" via package "Thread" at
tokenizer.perl line 130, <STDIN> line 8000.
毫无疑问,这可以帮助我更好地进行调试.
Which no doubt would have helped me debug this better.
推荐答案
问题似乎是,如果perl
不支持线程,perl脚本将崩溃.您可以通过运行以下命令来检查perl
是否支持线程:
The problem seems to be that the perl script crashes if perl
does not support threads. You can check if your perl
supports threads by running:
perl -MConfig -E 'say "Threads supported" if $Config{useithreads}'
在我的情况下,输出为空,因此我安装了具有线程支持的新perl:
In my case, the output was empty so I installed a new perl with thread support:
perlbrew install perl-5.30.0 --as=5.30.0-threads -Dusethreads
perlbrew use 5.30.0-threads
然后我再次运行Python脚本:
Then I ran the Python script again:
import sys
import os
import subprocess
import threading
def copy_out(source, dest):
for line in iter(source.readline, b''):
dest.write(line)
num_threads=4
args = ["perl", "tokenizer.perl",
"-l", "en",
"-threads", str(num_threads)
]
tokenizer = subprocess.Popen(
args,
bufsize=-1, #use default bufsize = 8192 bytes
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.DEVNULL)
tokenizer_thread = threading.Thread(
target=copy_out, args=(tokenizer.stdout, open("outfile", "wb")))
tokenizer_thread.start()
num_lines = 100000
for _ in range(num_lines):
tokenizer.stdin.write(b'Random line.\n')
tokenizer.stdin.close()
tokenizer_thread.join()
tokenizer.wait()
现在它运行到最后没有任何错误,并生成了具有100000行的输出文件outfile
.
and it now ran to the end with no errors and produced the output file outfile
with 100000 lines.
这篇关于如果作为Python子进程调用,则多线程Perl脚本会导致管道中断的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!