bash的程序来从文本文件返回给定的行号的页号 [英] bash routine to return the page number of a given line number from text file
问题描述
考虑包含页面破ASCII控制字符换页的纯文本文件($'\\ F')
Consider a plain text file containing page-breaking ASCII control character "Form Feed" ($'\f'):
alpha\n
beta\n
gamma\n\f
one\n
two\n
three\n
four\n
five\n\f
earth\n
wind\n
fire\n
water\n\f
注意,每一页都有行的一个随机数。
Note that each page has a random number of lines.
需要一个bash例程从文本文件返回包含页面破ASCII控制字符一个给定的行号的页码。
Need a bash routine that return the page number of a given line number from a text file containing page-breaking ASCII control character.
在很长一段时间研究,我终于穿过这片code传来的解决方案:
After a long time researching the solution I finally came across this piece of code:
function get_page_from_line
{
local nline="$1"
local input_file="$2"
local npag=0
local ln=0
local total=0
while IFS= read -d $'\f' -r page; do
npag=$(( ++npag ))
ln=$(echo -n "$page" | wc -l)
total=$(( total + ln ))
if [ $total -ge $nline ]; then
echo "${npag}"
return
fi
done < "$input_file"
echo "0"
return
}
但是,不幸的是,该解决方案被证明是在某些情况下,非常缓慢。
But, unfortunately, this solution proved to be very slow in some cases.
任何更好的解决方案?
谢谢!
推荐答案
的主意,用读-d $'\\ F'
然后再以数线是好的
The idea to use read -d $'\f'
and then to count the lines is good.
该版本migth似乎没有ellegant:如果 n线
大于或等于文件中的行数,则该文件被读两次
This version migth appear not ellegant: if nline
is greater than or equal to the number of lines in the file, then the file is read twice.
试试看吧,因为它是超级快:
Give it a try, because it is super fast:
function get_page_from_line ()
{
local nline="${1}"
local input_file="${2}"
if [[ $(wc -l "${input_file}" | awk '{print $1}') -lt nline ]] ; then
printf "0\n"
else
printf "%d\n" $(( $(head -n ${nline} "${input_file}" | grep -c "^"$'\f') + 1 ))
fi
}
性能 AWK 比上述bash的版本更好。 AWK 这样的文本处理已创建。
Performance of awk is better than the above bash version. awk was created for such text processing.
给这个测试版一试:
function get_page_from_line ()
{
awk -v nline="${1}" '
BEGIN {
npag=1;
}
{
if (index($0,"\f")>0) {
npag++;
}
if (NR==nline) {
print npag;
linefound=1;
exit;
}
}
END {
if (!linefound) {
print 0;
}
}' "${2}"
}
当遇到 \\ F
,页面数量有所增加。
When \f
is encountered, the page number is increased.
NR
是当前行号。
有关的历史,还有另外的bash版本
For history, there is another bash version.
此版本仅使用内置它的命令来计算在当前页面的行。
This version is using only built-it commands to count the lines in current page.
的 speedtest.sh
您在意见提供了表明,它是一个有点超前,这使得相当于你的版本,它(20秒左右):
The speedtest.sh
that you had provided in the comments showed it is a little bit ahead (20 sec approx.) which makes it equivalent to your version:
function get_page_from_line ()
{
local nline="$1"
local input_file="$2"
local npag=0
local total=0
while IFS= read -d $'\f' -r page; do
npag=$(( npag + 1 ))
IFS=$'\n'
for line in ${page}
do
total=$(( total + 1 ))
if [[ total -eq nline ]] ; then
printf "%d\n" ${npag}
unset IFS
return
fi
done
unset IFS
done < "$input_file"
printf "0\n"
return
}
这篇关于bash的程序来从文本文件返回给定的行号的页号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!