与 cat 相比,Bash while read 循环非常慢,为什么? [英] Bash while read loop extremely slow compared to cat, why?

查看:45
本文介绍了与 cat 相比,Bash while read 循环非常慢,为什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这里有一个简单的测试脚本:

A simple test script here:

while read LINE; do
        LINECOUNT=$(($LINECOUNT+1))
        if [[ $(($LINECOUNT % 1000)) -eq 0 ]]; then echo $LINECOUNT; fi
done

当我做cat my450klinefile.txt |myscript CPU 锁定在 100% 并且每秒可以处理大约 1000 行.大约 5 分钟处理 cat my450klinefile.txt >/dev/null 在半秒内所做的事情.

When I do cat my450klinefile.txt | myscript the CPU locks up at 100% and it can process about 1000 lines a second. About 5 minutes to process what cat my450klinefile.txt >/dev/null does in half a second.

是否有一种更有效的方法来基本上做到这一点.我只需要从 stdin 中读取一行,计算字节数,然后将其写出到命名管道中.但即使是这个例子的速度也慢得令人难以置信.

Is there a more efficient way to do essentially this. I just need to read a line from stdin, count the bytes, and write it out to a named pipe. But the speed of even this example is impossibly slow.

每 1Gb 的输入行,我都需要执行一些更复杂的脚本操作(关闭和打开一些正在输入数据的管道).

Every 1Gb of input lines I need to do a few more complex scripting actions (close and open some pipes that the data is being feed to).

推荐答案

while read 这么慢的原因是 shell 需要对每个字节进行一次系统调用.它无法从管道中读取大缓冲区,因为 shell 不能从输入流中读取多于一行,因此必须将每个字符与换行符进行比较.如果您在 while read 循环上运行 strace,您可以看到这种行为.这种行为是可取的,因为它可以可靠地执行以下操作:

The reason while read is so slow is that the shell is required to make a system call for every byte. It cannot read a large buffer from the pipe, because the shell must not read more than one line from the input stream and therefore must compare each character against a newline. If you run strace on a while read loop, you can see this behavior. This behavior is desirable, because it makes it possible to reliably do things like:

while read size; do dd bs=$size count=1 of=file$(( i++ )); done

其中循环内的命令从 shell 读取的同一流中读取.如果 shell 通过读取大缓冲区消耗了大量数据,则内部命令将无法访问该数据.一个不幸的副作用是 read 慢得离谱.

in which the commands inside the loop are reading from the same stream that the shell reads from. If the shell consumed a big chunk of data by reading large buffers, the inner commands would not have access to that data. An unfortunate side-effect is that read is absurdly slow.

这篇关于与 cat 相比,Bash while read 循环非常慢,为什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆