在BASH中按字节读取文件 [英] Read a file by bytes in BASH

查看：82 发布时间：2020/9/18 21:52:54 bash

本文介绍了在BASH中按字节读取文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要读取我指定的文件的第一个字节，然后读取第二个字节，第三个字节，依此类推.我该如何在BASH上做到这一点? P.S我需要获取此字节的十六进制数

I need to read first byte of file I specified, then second byte,third and so on. How could I do it on BASH? P.S I need to get HEX of this bytes

完全重写:2019年9月！

比以前的版本短很多而且简单！ (速度更快，但没有那么快)

Full rewrite: september 2019!

A lot shorter and simplier than previous versions! (Something faster, but not so much)

语法:

LANG=C IFS= read -r -d '' -n 1 foo

将用1个二进制字节填充$foo.不幸的是，由于bash字符串不能容纳空字节($ \0)，因此需要一次读取一个字节.

will populate $foo with 1 binary byte. Unfortunately, as bash strings cannot hold null bytes ($\0), reading one byte once is required.

但是对于读取的字节的值，我在man bash中错过了这一点(请看一下2016年底的文章):

But for the value of byte read, I've missed this in man bash (have a look at 2016 post, at bottom of this):

 printf [-v var] format [arguments]
 ...
     Arguments to non-string format specifiers are treated as C constants,
     except that ..., and if  the leading character is a  single or double
     quote, the value is the ASCII value of the following character.

所以:

read8() {
    local _r8_var=${1:-OUTBIN} _r8_car LANG=C IFS=
    read -r -d '' -n 1 _r8_car
    printf -v $_r8_var %d \'$_r8_car
}

将使用STDIN的第一个字节的十进制ascii值填充提交的变量名称(默认为$OUTBIN)

Will populate submitted variable name (default to $OUTBIN) with decimal ascii value of first byte from STDIN

read16() {
    local _r16_var=${1:-OUTBIN} _r16_lb _r16_hb
    read8 _r16_lb &&
    read8 _r16_hb
    printf -v $_r16_var %d $(( _r16_hb<<8 | _r16_lb ))
}

将使用来自STDIN的前16位字的十进制值填充提交的变量名称(默认为$OUTBIN)

Will populate submitted variable name (default to $OUTBIN) with decimal value of first 16 bits word from STDIN...

当然，要切换 Endianness ，您必须进行以下切换:

Of course, for switching Endianness, you have to switch:

    read8 _r16_hb &&
    read8 _r16_lb

以此类推:

# Usage:
#       read[8|16|32|64] [varname] < binaryStdInput

read8() {  local _r8_var=${1:-OUTBIN} _r8_car LANG=C IFS=
    read -r -d '' -n 1 _r8_car
    printf -v $_r8_var %d \'$_r8_car ;}
read16() { local _r16_var=${1:-OUTBIN} _r16_lb _r16_hb
    read8  _r16_lb && read8  _r16_hb
    printf -v $_r16_var %d $(( _r16_hb<<8 | _r16_lb )) ;}
read32() { local _r32_var=${1:-OUTBIN} _r32_lw _r32_hw
    read16 _r32_lw && read16 _r32_hw
    printf -v $_r32_var %d $(( _r32_hw<<16| _r32_lw )) ;}
read64() { local _r64_var=${1:-OUTBIN} _r64_ll _r64_hl
    read32 _r64_ll && read32 _r64_hl
    printf -v $_r64_var %d $(( _r64_hl<<32| _r64_ll )) ;}

因此您可以source这样做，然后如果/dev/sda已被gpt分区，

So you could source this, then if your /dev/sda is gpt partitioned,

read totsize < <(blockdev --getsz /dev/sda)
read64 gptbackup < <(dd if=/dev/sda bs=8 skip=68 count=1 2>/dev/null)
echo $((totsize-gptbackup))
1

答案可能是1(第一个GPT位于扇区1，一个扇区为512字节.GPT备份位置位于字节32.使用bs=8 512-> 64 + 32-> 4 = 544-> ；跳过68个块...请参见 Wikipedia上的GUID分区表).

Answer could be 1 (1st GPT is at sector 1, one sector is 512 bytes. GPT Backup location is at byte 32. With bs=8 512 -> 64 + 32 -> 4 = 544 -> 68 blocks to skip... See GUID Partition Table at Wikipedia).

write () { 
    local i=$((${2:-64}/8)) o= v r
    r=$((i-1))
    for ((;i--;)) {
        printf -vv '\%03o' $(( ($1>>8*(0${3+-1}?i:r-i))&255 ))
        o+=$v
    }
    printf "$o"
}

此函数默认为64位，低端字节序.

This function default to 64 bits, little endian.

Usage: write <integer> [bits:64|32|16|8] [switchto big endian]

具有两个参数，第二个参数必须为8，16，32或64之一，以作为生成的输出的位长.
使用任何第3个哑元参数(即使是空字符串)，功能也会切换为大端字节序.

With two parameter, second parameter must be one of 8, 16, 32 or 64, to be bit length of generated output.
With any dummy 3th parameter, (even empty string), function will switch to big endian.

read64 foo < <(write -12345);echo $foo
-12345

...

借助内置的新版本的printf，您可以做很多事而不必派生($(...))，从而使脚本运行更快.

With new version of printf built-in, you could do a lot without having to fork ($(...)) making so your script a lot faster.

首先，让我们看看(通过使用seq和sed)如何解析 hd输出:

First let see (by using seq and sed) how to parse hd output:

echo ;sed <(seq -f %02g 0 $(( COLUMNS-1 )) ) -ne '
    /0$/{s/^\(.*\)0$/\o0337\o033[A\1\o03380/;H;};
    /[1-9]$/{s/^.*\(.\)/\1/;H};
    ${x;s/\n//g;p}';hd < <(echo Hello good world!)
0         1         2         3         4         5         6         7
012345678901234567890123456789012345678901234567890123456789012345678901234567
00000000  48 65 6c 6c 6f 20 67 6f  6f 64 20 77 6f 72 6c 64  |Hello good world|
00000010  21 0a                                             |!.|
00000012

十六进制部分从第10列开始，到第56列结束，相隔3个字符，在第34列处有一个多余的空间.

Were hexadecimal part begin at col 10 and end at col 56, spaced by 3 chars and having one extra space at col 34.

因此，可以通过以下方式进行解析:

So parsing this could by done by:

while read line ;do
    for x in ${line:10:48};do
        printf -v x \\%o 0x$x
        printf $x
      done
  done < <( ls -l --color | hd )

旧的原始帖子

编辑2 (十六进制)，您可以使用hd

Old original post

Edit 2 for Hexadecimal, you could use hd

echo Hello world | hd
00000000  48 65 6c 6c 6f 20 77 6f  72 6c 64 0a              |Hello world.|

或od

echo Hello world | od -t x1 -t c
0000000  48  65  6c  6c  6f  20  77  6f  72  6c  64  0a
          H   e   l   l   o       w   o   r   l   d  \n

不久

while IFS= read -r -n1 car;do [ "$car" ] && echo -n "$car" || echo ; done

尝试:

while IFS= read -rn1 c;do [ "$c" ]&&echo -n "$c"||echo;done < <(ls -l --color)

说明:

while IFS= read -rn1 car  # unset InputFieldSeparator so read every chars
    do [ "$car" ] &&      # Test if there is ``something''?
        echo -n "$car" || # then echo them
        echo              # Else, there is an end-of-line, so print one
  done

修改；问题已需要十六进制值！?

Edit; Question was edited: need hex values!?

od -An -t x1 | while read line;do for char in $line;do echo $char;done ;done

演示:

od -An -t x1 < <(ls -l --color ) |        # Translate binary to 1 byte hex 
    while read line;do                    # Read line of HEX pairs
        for char in $line;do              # For each pair
            printf "\x$char"              # Print translate HEX to binary
      done
  done

演示2:我们同时提供了十六进制和二进制

od -An -t x1 < <(ls -l --color ) |        # Translate binary to 1 byte hex 
    while read line;do                    # Read line of HEX pairs
        for char in $line;do              # For each pair
            bin="$(printf "\x$char")"     # translate HEX to binary
            dec=$(printf "%d" 0x$char)    # translate to decimal
            [ $dec -lt 32  ] ||           # if caracter not printable
            ( [ $dec -gt 128 ] &&         # change bin to a single dot.
              [ $dec -lt 160 ] ) && bin="."
            str="$str$bin" 
            echo -n $char \               # Print HEX value and a space
            ((i++))                       # count printed values
            if [ $i -gt 15 ] ;then
                i=0
                echo "  -  $str"
                str=""
              fi
      done
  done

2016年9月的新帖子:

这在非常特殊的情况下可能很有用((我已经使用它们在较低级别上手动复制了两个磁盘之间的GPT分区，而没有安装/usr ...)

...但是只有一个字节，一个字节...(因为无法正确读取`char(0)'，因此，正确读取它们的唯一方法是考虑文件结尾，如果未读取任何字符且未到达文件末尾，则读取的字符为char(0)).

... but only one byte, by one... (because `char(0)' couldn't be correctly read, the only way of reading them correctly is to consider end-of-file, where if no caracter is read and end of file not reached, then character read is a char(0)).

这比一个有用的工具更能证明概念:有一个 纯 bash 版本的hd(hexdump).

This is more a proof of concept than a relly usefull tool: there is a pure bash version of hd (hexdump).

此版本使用最近的 bashisms ，bash v4.3或更高版本.

This use recent bashisms, under bash v4.3 or higher.

#!/bin/bash

printf -v ascii \\%o {32..126}
printf -v ascii "$ascii"

printf -v cntrl %-20sE abtnvfr

values=()
todisplay=
address=0
printf -v fmt8 %8s
fmt8=${fmt8// / %02x}

while LANG=C IFS= read -r -d '' -n 1 char ;do
    if [ "$char" ] ;then
        printf -v char "%q" "$char"
        ((${#char}==1)) && todisplay+=$char || todisplay+=.
        case ${#char} in
         1|2 ) char=${ascii%$char*};values+=($((${#char}+32)));;
           7 ) char=${char#*\'\\};values+=($((8#${char%\'})));;
           5 ) char=${char#*\'\\};char=${cntrl%${char%\'}*};
                values+=($((${#char}+7)));;
           * ) echo >&2 ERROR: $char;;
        esac
      else
        values+=(0)
      fi

    if [ ${#values[@]} -gt 15 ] ;then
        printf "%08x $fmt8 $fmt8  |%s|\n" $address ${values[@]} "$todisplay"
        ((address+=16))
        values=() todisplay=
      fi
  done

if [ "$values" ] ;then
        ((${#values[@]}>8))&&fmt="$fmt8 ${fmt8:0:(${#values[@]}%8)*5}"||
            fmt="${fmt8:0:${#values[@]}*5}"
        printf "%08x $fmt%$((
                50-${#values[@]}*3-(${#values[@]}>8?1:0)
            ))s |%s|\n" $address ${values[@]} ''""'' "$todisplay"
fi
printf "%08x (%d chars read.)\n" $((address+${#values[@]})){,}

您可以尝试/使用此功能，但不要尝试比较性能！

You could try/use this, but don't try to compare performances!

time hd < <(seq 1 10000|gzip)|wc
   1415   25480  111711
real    0m0.020s
user    0m0.008s
sys     0m0.000s

time ./hex.sh < <(seq 1 10000|gzip)|wc
   1415   25452  111669
real    0m2.636s
user    0m2.496s
sys     0m0.048s

相同的工作:hd为20毫秒，而我的bash script为2000毫秒.

same job: 20ms for hd vs 2000ms for my bash script.

...但是如果您想读取文件头中的4个字节，甚至是硬盘中的扇区地址，这都可以完成工作...

... but if you wanna read 4 bytes in a file header or even a sector address in an hard drive, this could do the job...

这篇关于在BASH中按字节读取文件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在BASH中按字节读取文件 [英] Read a file by bytes in BASH

问题描述

推荐答案

完全重写:2019年9月！

Full rewrite: september 2019!

旧的原始帖子

Old original post

2016年9月的新帖子:

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在BASH中按字节读取文件 [英] Read a file by bytes in BASH

问题描述

推荐答案

完全重写:2019年9月！

Full rewrite: september 2019!

旧的原始帖子

Old original post

2016年9月的新帖子:

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭