bash的程序来从文本文件返回给定的行号的页号 [英] bash routine to return the page number of a given line number from text file

查看:125
本文介绍了bash的程序来从文本文件返回给定的行号的页号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑包含页面破ASCII控制字符换页的纯文本文件($'\\ F')

Consider a plain text file containing page-breaking ASCII control character "Form Feed" ($'\f'):

alpha\n
beta\n
gamma\n\f
one\n
two\n
three\n
four\n
five\n\f
earth\n
wind\n
fire\n
water\n\f

注意,每一页都有行的一个随机数。

Note that each page has a random number of lines.

需要一个bash例程从文本文件返回包含页面破ASCII控制字符一个给定的行号的页码。

Need a bash routine that return the page number of a given line number from a text file containing page-breaking ASCII control character.

在很长一段时间研究,我终于穿过这片code传来的解决方案:

After a long time researching the solution I finally came across this piece of code:

function get_page_from_line
{
    local nline="$1"
    local input_file="$2"

    local npag=0
    local ln=0
    local total=0

    while IFS= read -d $'\f' -r page; do

        npag=$(( ++npag ))

        ln=$(echo -n "$page" | wc -l)

        total=$(( total + ln ))

        if [ $total -ge $nline ]; then
            echo "${npag}"
            return
        fi

    done < "$input_file"

    echo "0"

    return
}

但是,不幸的是,该解决方案被证明是在某些情况下,非常缓慢。

But, unfortunately, this solution proved to be very slow in some cases.

任何更好的解决方案?

谢谢!

推荐答案

的主意,用读-d $'\\ F'然后再以数线是好的

The idea to use read -d $'\f' and then to count the lines is good.

该版本migth似乎没有ellegant:如果 n线大于或等于文件中的行数,则该文件被读两次

This version migth appear not ellegant: if nline is greater than or equal to the number of lines in the file, then the file is read twice.

试试看吧,因为它是超级快:

Give it a try, because it is super fast:

function get_page_from_line ()
{
    local nline="${1}"
    local input_file="${2}"    
    if [[ $(wc -l "${input_file}" | awk '{print $1}') -lt nline ]] ; then
        printf "0\n"
    else
        printf "%d\n" $(( $(head -n ${nline} "${input_file}" | grep -c "^"$'\f') + 1 ))
    fi
}

性能 AWK 比上述bash的版本更好。 AWK 这样的文本处理已创建。

Performance of awk is better than the above bash version. awk was created for such text processing.

给这个测试版一试:

function get_page_from_line ()
{
  awk -v nline="${1}" '
    BEGIN {
      npag=1;
    }
    {
      if (index($0,"\f")>0) {
        npag++;
      }
      if (NR==nline) {
        print npag;
        linefound=1;
        exit;
      }
    }
    END {
      if (!linefound) {
        print 0;
      }
    }' "${2}"
}

当遇到 \\ F ,页面数量有所增加。

When \f is encountered, the page number is increased.

NR 是当前行号。

有关的历史,还有另外的bash版本

For history, there is another bash version.

此版本仅使用内置它的命令来计算在当前页面的行。

This version is using only built-it commands to count the lines in current page.

speedtest.sh 您在意见提供了表明,它是一个有点超前,这使得相当于你的版本,它(20秒左右):

The speedtest.sh that you had provided in the comments showed it is a little bit ahead (20 sec approx.) which makes it equivalent to your version:

function get_page_from_line ()
{
    local nline="$1"
    local input_file="$2"

    local npag=0
    local total=0

    while IFS= read -d $'\f' -r page; do
        npag=$(( npag + 1 ))
        IFS=$'\n'
        for line in ${page}
        do
            total=$(( total + 1 ))
            if [[ total -eq nline ]] ; then
                printf "%d\n" ${npag}
                unset IFS
                return
            fi
        done
        unset IFS
    done < "$input_file"
    printf "0\n"
    return
}

这篇关于bash的程序来从文本文件返回给定的行号的页号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆