与cat相比,bash同时读取循环极慢,为什么呢? [英] Bash while read loop extremely slow compared to cat, why?

查看:68
本文介绍了与cat相比,bash同时读取循环极慢,为什么呢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此处是一个简单的测试脚本:

A simple test script here:

while read LINE; do
        LINECOUNT=$(($LINECOUNT+1))
        if [[ $(($LINECOUNT % 1000)) -eq 0 ]]; then echo $LINECOUNT; fi
done

当我执行cat my450klinefile.txt | myscript时,CPU锁定为100%,它每秒可以处理大约1000行.大约需要5分钟来处理cat my450klinefile.txt >/dev/null在半秒钟内的工作.

When I do cat my450klinefile.txt | myscript the CPU locks up at 100% and it can process about 1000 lines a second. About 5 minutes to process what cat my450klinefile.txt >/dev/null does in half a second.

有没有一种更有效的方法来做到这一点.我只需要从stdin中读取一行,计算字节数,然后将其写到命名管道中即可.但是,即使这个例子的速度也不可能太慢.

Is there a more efficient way to do essentially this. I just need to read a line from stdin, count the bytes, and write it out to a named pipe. But the speed of even this example is impossibly slow.

每1Gb输入行中,我需要执行一些更复杂的脚本操作(关闭并打开要馈送数据的某些管道).

Every 1Gb of input lines I need to do a few more complex scripting actions (close and open some pipes that the data is being feed to).

推荐答案

while read如此之慢的原因是需要外壳程序对每个字节进行系统调用.它不能从管道读取较大的缓冲区,因为外壳程序不得从输入流读取多行内容,因此必须将每个字符与换行符进行比较.如果在while read循环上运行strace,则可以看到此行为.这种行为是可取的,因为它可以可靠地执行以下操作:

The reason while read is so slow is that the shell is required to make a system call for every byte. It cannot read a large buffer from the pipe, because the shell must not read more than one line from the input stream and therefore must compare each character against a newline. If you run strace on a while read loop, you can see this behavior. This behavior is desirable, because it makes it possible to reliably do things like:

while read size; do dd bs=$size count=1 of=file$(( i++ )); done

,其中循环内的命令从外壳程序读取的同一流中读取.如果外壳程序通过读取大缓冲区消耗了大量数据,则内部命令将无法访问该数据.不幸的是,read的运行速度非常慢.

in which the commands inside the loop are reading from the same stream that the shell reads from. If the shell consumed a big chunk of data by reading large buffers, the inner commands would not have access to that data. An unfortunate side-effect is that read is absurdly slow.

这篇关于与cat相比,bash同时读取循环极慢,为什么呢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆