如何在文件中打印包含指定字节偏移量的整行? [英] How to print the whole line that contains a specified byte offset in a file?

查看:134
本文介绍了如何在文件中打印包含指定字节偏移量的整行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个示例input.txt文件:

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut
enim ad minim veniam, quis nostrud exercitation ullamco laboris
nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor
in reprehenderit in voluptate velit esse cillum dolore eu fugiat
nulla pariatur. Excepteur sint occaecat cupidatat non proident,
sunt in culpa qui officia deserunt mollit anim id est laborum.

现在,我可以轻松地grep输入一个单词并获取其字节偏移量:

Now I can easly grep for a word and get it's byte offset:

$ grep -ob incididunt /dev/null input.txt 
input.txt:80:incididunt

可悲的是,有关行内容的信息和有关所搜索单词的信息都丢失了.我只知道文件名和80字节偏移量.我想在文件中打印包含该字节偏移量的整行.

Sadly, the information about the line contents and the information about th searched word gets lost. I only know the filename and the 80 byte offset. I want to print the whole line that contains that byte offset inside the file.

因此理想情况下,将得到具有两个参数(文件名和字节偏移)的script.sh,以输出搜索到的行:

So ideally that would be to get a script.sh that with two parameters, a file name and a byte offset, outputs the searched line:

$ ./script.sh input.txt 80
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut

另一个例子:

对于file = input.txt和字节offset = 130,输出应为:

For the file=input.txt and the byte offset=130 the output should be:

enim ad minim veniam, quis nostrud exercitation ullamco laboris

对于file = input.txt以及195到253之间的任何字节偏移,输出应为:

For the file=input.txt and any byte offset between 195 up until 253 the output should be:

nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor

对于file = input.txt和字节offset = 400,输出应为:

For the file=input.txt and the byte offset=400 the output should be:

sunt in culpa qui officia deserunt mollit anim id est laborum.

我尝试过:

我可以从字节偏移开始打印直到使用gnu sed行的末尾,但是错过了eiusmod tempor部分.我想不出如何在文件中返回",以从换行符中提取该部分直到该字节偏移的想法.

I can print from the byte offset up until the end of the line with gnu sed, however that misses the eiusmod tempor part. I can't think of any idea how to go "back" in the file, to fetch the part from the newline up until that byte offset.

$ sed -z 's/.\{80\}\([^\n]*\).*/\1\n/' input.txt 
incididunt ut labore et dolore magna aliqua. Ut

我可以逐个字符地读取字符,记住上一个换行符,并从最后一个换行符打印到下一个换行符.对于shell read,这将不起作用,因为它省略了换行符.我想我可以使用dd来使用它,但是肯定有一个更简单的解决方案.

I can read character by character, remember last newline, and print from the last newline up until the next. That will not work with shells read, as it omits newlines. I think I can get it to work with using dd, but there's surely must be a simpler solution.

set -- inpux.txt 80
exec 10<"$1"
pos=0
lastnewlinepos=0
for ((i=0;i<"$2";++i)); do
        IFS= read -r -u 10 -N 1 c
        pos=$((pos+1))
        # this will not work..., read omits newlines
        if [ "$c" = $'\n' ]; then
                lastnewlinepost="$pos"
        fi
done
# as I know the last newline before the offset, it's ok to use this now
sed -z 's/.\{'"$lastnewlinepos"'\}\([^\n]*\).*/\1\n/' "$1"

如何使用bash和* nix专用工具打印包含"文件内字节偏移量的整行?

How to print the whole line that "contains" the byte offset inside a file using bash and *nix specific tools?

推荐答案

当变量达到您的 byte offset 打印当前行并退出时,请在变量中保留到目前为止读取的字节数. >

Keep the number of bytes read so far in a variable, when it reaches your byte offset print current line and exit.

$ awk '{read+=1+length} read>=80{print;exit}' input.txt
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut
$ awk '{read+=1+length} read>=130{print;exit}' input.txt
enim ad minim veniam, quis nostrud exercitation ullamco laboris

length是当前行的长度,我们需要向其添加1,因为awk会修剪行中的记录分隔符(默认为\n).

length is the length of current line, we need to add 1 to it because awk trims the record separator (\n by default) from lines.

请注意,在GAWK中,length将对字符进行计数,根据语言环境的不同,最多可能占用六个字节.要使其计数字节,您需要在命令行上指定-b选项.

Note that in GAWK, length will count characters, which may take up to six bytes depending on the locale. To make it count bytes you need to specify -b option on the command line.

gawk -b '{read+=1+length} read>=130{print;exit}' input.txt

这篇关于如何在文件中打印包含指定字节偏移量的整行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆