在文件中打印单行的最快方法 [英] Fastest way to print a single line in a file

查看:136
本文介绍了在文件中打印单行的最快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须从一个大文件(1500000行)中提取一条特定的行,多次遍历多个文件,我问自己是什么才是最好的选择 (就性能而言). 有很多方法可以做到这一点,我很乐意使用这2

I have to fetch one specific line out of a big file (1500000 lines), multiple times in a loop over multiple files, I was asking my self what would be the best option (in terms of performance). There are many ways to do this, i manly use these 2

cat ${file} | head -1

cat ${file} | sed -n '1p'

我找不到答案吗?是它们都只获取第一行,还是两者都是(或两者)之一??首先打开整个文件,然后获取行1?

I could not find an answer to this do they both only fetch the first line or one of the two (or both) first open the whole file and then fetch the row 1?

推荐答案

放弃对cat的无用使用并执行以下操作:

Drop the useless use of cat and do:

$ sed -n '1{p;q}' file

这行打印后将退出sed脚本.

This will quit the sed script after the line has been printed.

基准化脚本:

#!/bin/bash

TIMEFORMAT='%3R'
n=25
heading=('head -1 file' 'sed -n 1p file' "sed -n '1{p;q} file" 'read line < file && echo $line')

# files upto a hundred million lines (if your on slow machine decrease!!)
for (( j=1; j<=100,000,000;j=j*10 ))
do
    echo "Lines in file: $j"
    # create file containing j lines
    seq 1 $j > file
    # initial read of file
    cat file > /dev/null

    for comm in {0..3}
    do
        avg=0
        echo
        echo ${heading[$comm]}    
        for (( i=1; i<=$n; i++ ))
        do
            case $comm in
                0)
                    t=$( { time head -1 file > /dev/null; } 2>&1);;
                1)
                    t=$( { time sed -n 1p file > /dev/null; } 2>&1);;
                2)
                    t=$( { time sed '1{p;q}' file > /dev/null; } 2>&1);;
                3)
                    t=$( { time read line < file && echo $line > /dev/null; } 2>&1);;
            esac
            avg=$avg+$t
        done
        echo "scale=3;($avg)/$n" | bc
    done
done

只需另存为benchmark.sh并运行bash benchmark.sh.

结果:

head -1 file
.001

sed -n 1p file
.048

sed -n '1{p;q} file
.002

read line < file && echo $line
0

**文件中包含1,000,000行的结果.*

**Results from file with 1,000,000 lines.*

因此sed -n 1p的时间将随着文件的长度线性增长,但是其他变化的时间将是恒定的(并且可以忽略),因为它们在读取第一行后都退出了:

So the times for sed -n 1p will grow linearly with the length of the file but the timing for the other variations will be constant (and negligible) as they all quit after reading the first line:

注意:由于在更快的Linux机器上,时间与原始帖子不同.

这篇关于在文件中打印单行的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆