如果作为Python子进程调用,则多线程Perl脚本会导致管道中断 [英] Multithreaded Perl script leads to broken pipe if called as a Python subprocess

查看:125
本文介绍了如果作为Python子进程调用,则多线程Perl脚本会导致管道中断的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在从Python 3.7.3中调用带有子进程的Perl脚本.调用的Perl脚本就是这样的:

I am calling a Perl script from Python 3.7.3, with subprocess. The Perl script that is called is this one:

https://github.com/moses -smt/mosesdecoder/blob/master/scripts/tokenizer/tokenizer.perl

我用来调用它的代码是:

And the code I am using to call it is:

import sys
import os
import subprocess
import threading

def copy_out(source, dest):
    for line in source:
        dest.write(line)

num_threads=4

args = ["perl", "tokenizer.perl",
        "-l", "en",
        "-threads", str(num_threads)
       ]

with open(os.devnull, "wb") as devnull:
    tokenizer = subprocess.Popen(args,
        stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=devnull)

tokenizer_thread = threading.Thread(target=copy_out, args=(tokenizer.stdout, open("outfile", "wb")))
tokenizer_thread.start()

num_lines = 100000

for _ in range(num_lines):
    tokenizer.stdin.write(b'Random line.\n')

tokenizer.stdin.close()
tokenizer_thread.join()

tokenizer.wait()

在我的系统上,这导致以下错误:

On my system, this leads to the following error:

Traceback (most recent call last):
  File "t.py", line 27, in <module>
    tokenizer.stdin.write(b'Random line.\n')
BrokenPipeError: [Errno 32] Broken pipe

我对此进行了调查,结果发现,如果子进程的-threads参数为1,则不会引发错误.因为我不想放弃子进程中的多线程,所以我的问题是:

I investigated this, and it turns out that if the -threads argument for the subprocess is 1 the error is not thrown. As I don't want to give up on multithreading in the child process, my question is:

首先导致此错误的原因是什么? 谁"应该归咎于谁:操作系统/环境,我的Python代码,Perl代码?

很高兴在需要时提供更多信息.

I am glad to provide more information if needed.

编辑:要回复一些评论,

  • 仅当您还具有以下文件时,才可以运行Perl脚本: https://github.com/moses-smt/mosesdecoder/blob/master/scripts/share/nonbreaking_prefixes/nonbreaking_prefix.en
  • 在过程失败之前,Perl脚本实际上处理了数千行.在上面的Python脚本中,如果将num_lines变小,则不会再出现此错误.
  • 如果我仅在命令行中调用此Perl脚本,而没有任何Python,它就可以正常工作:无论有多少(Perl)线程或输入行.
  • 我的Python变量num_threads仅控制Perl子进程的线程数.我从不启动多个Python线程,只有一个.
  • Running the Perl script is only possible if you also have this file: https://github.com/moses-smt/mosesdecoder/blob/master/scripts/share/nonbreaking_prefixes/nonbreaking_prefix.en
  • The Perl script actually processes several thousands of lines before the process fails. In my Python script above, if I make num_lines smaller, I do not get this error anymore.
  • If I invoke this Perl script simply on the command line, without any Python, it works fine: no matter how many (Perl) threads or lines of input.
  • My Python variable num_threads only controls the number of threads of the Perl subprocess. I never start several Python threads, just one.

EDIT 2 :在我的第一次编辑中,我错误地指出,当使用例如从命令行-threads 4:在那里,使用了一个不同的Perl,该Perl是通过多线程编译的.如果我使用从Python调用的相同的Perl,则会得到:

EDIT 2: In my first edit, I incorrectly stated that this Perl program runs fine when called with e.g. -threads 4 from the command line: there, a different Perl was used that is compiled with multithreading. If I use the same Perl that is invoked from Python, I get:

$ cat [file with 100000 lines] | [correct perl] tokenizer.perl -l en -threads 4
Can't locate object method "new" via package "Thread" at
tokenizer.perl line 130, <STDIN> line 8000.

毫无疑问,这可以帮助我更好地进行调试.

Which no doubt would have helped me debug this better.

推荐答案

问题似乎是,如果perl不支持线程,perl脚本将崩溃.您可以通过运行以下命令来检查perl是否支持线程:

The problem seems to be that the perl script crashes if perl does not support threads. You can check if your perl supports threads by running:

perl -MConfig -E 'say "Threads supported" if $Config{useithreads}'

在我的情况下,输出为空,因此我安装了具有线程支持的新perl:

In my case, the output was empty so I installed a new perl with thread support:

perlbrew install perl-5.30.0 --as=5.30.0-threads -Dusethreads
perlbrew use 5.30.0-threads

然后我再次运行Python脚本:

Then I ran the Python script again:

import sys
import os
import subprocess
import threading

def copy_out(source, dest):
    for line in iter(source.readline, b''):
        dest.write(line)

num_threads=4
args = ["perl", "tokenizer.perl",
        "-l", "en",
        "-threads", str(num_threads)
       ]
tokenizer = subprocess.Popen(
    args,
    bufsize=-1,  #use default bufsize = 8192 bytes
    stdin=subprocess.PIPE,
    stdout=subprocess.PIPE,
    stderr=subprocess.DEVNULL)

tokenizer_thread = threading.Thread(
    target=copy_out, args=(tokenizer.stdout, open("outfile", "wb")))
tokenizer_thread.start()

num_lines = 100000

for _ in range(num_lines):
    tokenizer.stdin.write(b'Random line.\n')

tokenizer.stdin.close()
tokenizer_thread.join()
tokenizer.wait()

现在它运行到最后没有任何错误,并生成了具有100000行的输出文件outfile.

and it now ran to the end with no errors and produced the output file outfile with 100000 lines.

这篇关于如果作为Python子进程调用,则多线程Perl脚本会导致管道中断的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆