在文件中打印单行的最快方法 [英] Fastest way to print a single line in a file
问题描述
我必须从一个大文件(1500000行)中提取一条特定的行,多次遍历多个文件,我问自己是什么才是最好的选择 (就性能而言). 有很多方法可以做到这一点,我很乐意使用这2
I have to fetch one specific line out of a big file (1500000 lines), multiple times in a loop over multiple files, I was asking my self what would be the best option (in terms of performance). There are many ways to do this, i manly use these 2
cat ${file} | head -1
或
cat ${file} | sed -n '1p'
我找不到答案吗?是它们都只获取第一行,还是两者都是(或两者)之一??首先打开整个文件,然后获取行1?
I could not find an answer to this do they both only fetch the first line or one of the two (or both) first open the whole file and then fetch the row 1?
推荐答案
放弃对cat
的无用使用并执行以下操作:
Drop the useless use of cat
and do:
$ sed -n '1{p;q}' file
这行打印后将退出sed
脚本.
This will quit the sed
script after the line has been printed.
基准化脚本:
#!/bin/bash
TIMEFORMAT='%3R'
n=25
heading=('head -1 file' 'sed -n 1p file' "sed -n '1{p;q} file" 'read line < file && echo $line')
# files upto a hundred million lines (if your on slow machine decrease!!)
for (( j=1; j<=100,000,000;j=j*10 ))
do
echo "Lines in file: $j"
# create file containing j lines
seq 1 $j > file
# initial read of file
cat file > /dev/null
for comm in {0..3}
do
avg=0
echo
echo ${heading[$comm]}
for (( i=1; i<=$n; i++ ))
do
case $comm in
0)
t=$( { time head -1 file > /dev/null; } 2>&1);;
1)
t=$( { time sed -n 1p file > /dev/null; } 2>&1);;
2)
t=$( { time sed '1{p;q}' file > /dev/null; } 2>&1);;
3)
t=$( { time read line < file && echo $line > /dev/null; } 2>&1);;
esac
avg=$avg+$t
done
echo "scale=3;($avg)/$n" | bc
done
done
只需另存为benchmark.sh
并运行bash benchmark.sh
.
结果:
head -1 file
.001
sed -n 1p file
.048
sed -n '1{p;q} file
.002
read line < file && echo $line
0
**文件中包含1,000,000行的结果.*
**Results from file with 1,000,000 lines.*
因此sed -n 1p
的时间将随着文件的长度线性增长,但是其他变化的时间将是恒定的(并且可以忽略),因为它们在读取第一行后都退出了:
So the times for sed -n 1p
will grow linearly with the length of the file but the timing for the other variations will be constant (and negligible) as they all quit after reading the first line:
注意:由于在更快的Linux机器上,时间与原始帖子不同.
这篇关于在文件中打印单行的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!