调用外部egrep和更少时,子进程非常缓慢 [英] Subprocess very slow when calling external egrep and less

查看:134
本文介绍了调用外部egrep和更少时,子进程非常缓慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试构建一个python脚本,该脚本允许我动态构建 egrep -v 属性,并将输出传递到更少(或更多)。

我想使用外部egrep + less的原因是因为我正在处理的文件是非常大的文本文件(500MB +)。首先将它们读入列表并通过Python本地处理都非常缓慢。



但是,当我使用os.system或subprocess.call时,一切都非常缓慢,此时我想退出较少的输出并返回到Python代码。




我的代码应该像这样工作:

1. ./myless.py messages_500MB.txt

2 。显示messages_500MB.txt的-FRX输出(完整文件)。

3.当按'q'退出少量-FRX时,python代码应接管并显示用于输入文本的提示被排除在外。用户输入它并将其添加到
列表中4.我的python代码构建egrep -v'exclude1'并将输出传递到less $
5.用户重复步骤3和输入另一个东西被排除

6.现在我的Python代码调用egrep -v'exclude1 | exclude2'messages_500MB.txt | less -FRX

7.并且该过程继续


但是,这并不像预期的那样工作。

*在我的Mac上,当用户按下时退出-FRX更少,显示raw_input提示需要几秒钟

*在Linux机器上,我得到大量'egrep:写入输出:断开的管道'

*如果(仅限linux)在-FRX中,按CTRL + C,出于某种原因,退出-FRX会变得更快(按照预期)。在Mac上,我的python程序破解


这是我的代码示例:




<$ p $
myInput =''
while myInput!='q':
grepText ='|'.join(不包括)
if grepText =='':
command ='egrep'+ file +'| less -FRX'
else:
command ='egrep -v''+ grepText +'''+ file +'| less -FRX'

subprocess.call(command,shell = True)
myInput = raw_input('输入要排除的文本,q退出,#查看排除的内容:')
excluded.append(myInput)

任何帮助都将不胜感激

解决方案

其实我找出了问题所在


我对运行脚本时可见的错误做了一些研究在Linux上( egrep:写入输出:断开的管道),这导致我回答:
问题是当我使用egrep -v'xyz'文件|少了,当我退出的时候,子进程仍然继续运行egrep和大文件(500MB +),这需要一段时间。


显然,子进程分别执行两个程序,并且在第二个程序( less )退出后运行第一个程序( egrep ) >


为了正确地解决我的问题,我使用了这样的内容:

  command ='egrep -vsomething< filename>'
cmd2 =('less','-FRX')
egrep = subprocess.Popen(command,shell = True,stdout = subprocess .PIPE)
subprocess.check_call(cmd2,stdin = egrep.stdout)
egrep.terminate()

通过将第一个进程导出到第二个进程stdin,我现在可以在退出时立即终止egrep,现在我的python脚本正在飞行:)



干杯,

米洛斯


I'm trying to build a python script that will allow me dynamic build up on egrep -v attributes and pipe the output into less (or more).
The reason why I want to use external egrep+less is because files that I am processing are very large text files (500MB+). Reading them first into a list and processing all natively through Python is very slow.

However, when I use os.system or subprocess.call, everything is very slow at the moment I want to exit less output and return back to python code.

My code should work like this:
1. ./myless.py messages_500MB.txt
2. Less -FRX output of messages_500MB.txt is shown (complete file).
3. When I press 'q' to exit less -FRX, python code should take over and display prompt for user to enter text to be excluded. User enters it and I add this to the list
4. My python code builds up egrep -v 'exclude1' and pipes the output to less
5. User repeats step 3 and enters another stuff to be excluded
6. Now my python code calls egrep -v 'exclude1|exclude2' messages_500MB.txt | less -FRX
7. And the process continues

However, this does not work as expected.
* On my Mac, when user press q to exit less -FRX, it takes few seconds for raw_input prompt to be displayed
* On Linux machine, I get loads of 'egrep: writing output: Broken pipe'
* If, (linux only) while in less -FRX, I press CTRL+C, exiting less -FRX for some reason becomes much much quicker (as intended). On Mac, my python program breaks

Here is sample of my code:

excluded = list()
myInput = ''
while myInput != 'q':
    grepText = '|'.join(excluded)
    if grepText == '':
        command = 'egrep "" ' + file + ' | less -FRX'
    else:
        command = 'egrep -v "' + grepText + '" ' + file + ' | less -FRX'

    subprocess.call(command, shell=True)
    myInput = raw_input('Enter text to exclude, q to exit, # to see what is excluded: ')
    excluded.append(myInput)

Any help would be much appreciated

解决方案

Actually I figured out what the problem is

I did some research on error that is visible when running my script on Linux ("egrep: writing output: Broken pipe") and that lead me to the answer:
Issue is when I use egrep -v 'xyz' file | less, when I quit less, subprocess still continues to run egrep and on large files (500MB+) this takes a while.

Aparently, subprocess takes two programs separately and runs the first one (egrep) even after the second one (less) exited

To properly resolve my issue, I use something like this:

command = 'egrep -v "something" <filename>'
cmd2 = ('less', '-FRX') 
egrep = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE)
subprocess.check_call(cmd2, stdin=egrep.stdout)
egrep.terminate()

By piping out first process into second process stdin, I am now able to terminate egrep immediately when I exit less and now my python script is flying :)

Cheers,
Milos

这篇关于调用外部egrep和更少时,子进程非常缓慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆