在BASH中按字节读取文件 [英] Read a file by bytes in BASH

查看:82
本文介绍了在BASH中按字节读取文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要读取我指定的文件的第一个字节,然后读取第二个字节,第三个字节,依此类推.我该如何在BASH上做到这一点? P.S我需要获取此字节的十六进制数

I need to read first byte of file I specified, then second byte,third and so on. How could I do it on BASH? P.S I need to get HEX of this bytes

推荐答案

完全重写:2019年9月!

比以前的版本短很多而且简单! (速度更快,但没有那么快)

Full rewrite: september 2019!

A lot shorter and simplier than previous versions! (Something faster, but not so much)

语法:

LANG=C IFS= read -r -d '' -n 1 foo

将用1个二进制字节填充$foo.不幸的是,由于bash字符串不能容纳空字节($ \0),因此需要一次读取一个字节.

will populate $foo with 1 binary byte. Unfortunately, as bash strings cannot hold null bytes ($\0), reading one byte once is required.

但是对于读取的字节的,我在man bash中错过了这一点(请看一下2016年底的文章):

But for the value of byte read, I've missed this in man bash (have a look at 2016 post, at bottom of this):

 printf [-v var] format [arguments]
 ...
     Arguments to non-string format specifiers are treated as C constants,
     except that ..., and if  the leading character is a  single or double
     quote, the value is the ASCII value of the following character.

所以:

read8() {
    local _r8_var=${1:-OUTBIN} _r8_car LANG=C IFS=
    read -r -d '' -n 1 _r8_car
    printf -v $_r8_var %d \'$_r8_car
}

将使用STDIN的第一个字节的十进制ascii值填充提交的变量名称(默认为$OUTBIN)

Will populate submitted variable name (default to $OUTBIN) with decimal ascii value of first byte from STDIN

read16() {
    local _r16_var=${1:-OUTBIN} _r16_lb _r16_hb
    read8 _r16_lb &&
    read8 _r16_hb
    printf -v $_r16_var %d $(( _r16_hb<<8 | _r16_lb ))
}

将使用来自STDIN的前16位字的十进制值填充提交的变量名称(默认为$OUTBIN)

Will populate submitted variable name (default to $OUTBIN) with decimal value of first 16 bits word from STDIN...

当然,要切换 Endianness ,您必须进行以下切换:

Of course, for switching Endianness, you have to switch:

    read8 _r16_hb &&
    read8 _r16_lb

以此类推:

# Usage:
#       read[8|16|32|64] [varname] < binaryStdInput

read8() {  local _r8_var=${1:-OUTBIN} _r8_car LANG=C IFS=
    read -r -d '' -n 1 _r8_car
    printf -v $_r8_var %d \'$_r8_car ;}
read16() { local _r16_var=${1:-OUTBIN} _r16_lb _r16_hb
    read8  _r16_lb && read8  _r16_hb
    printf -v $_r16_var %d $(( _r16_hb<<8 | _r16_lb )) ;}
read32() { local _r32_var=${1:-OUTBIN} _r32_lw _r32_hw
    read16 _r32_lw && read16 _r32_hw
    printf -v $_r32_var %d $(( _r32_hw<<16| _r32_lw )) ;}
read64() { local _r64_var=${1:-OUTBIN} _r64_ll _r64_hl
    read32 _r64_ll && read32 _r64_hl
    printf -v $_r64_var %d $(( _r64_hl<<32| _r64_ll )) ;}

因此您可以source这样做,然后如果/dev/sda已被gpt分区,

So you could source this, then if your /dev/sda is gpt partitioned,

read totsize < <(blockdev --getsz /dev/sda)
read64 gptbackup < <(dd if=/dev/sda bs=8 skip=68 count=1 2>/dev/null)
echo $((totsize-gptbackup))
1

答案可能是1(第一个GPT位于扇区1,一个扇区为512字节.GPT备份位置位于字节32.使用bs=8 512-> 64 + 32-> 4 = 544-> ;跳过68个块...请参见 Wikipedia上的GUID分区表).

Answer could be 1 (1st GPT is at sector 1, one sector is 512 bytes. GPT Backup location is at byte 32. With bs=8 512 -> 64 + 32 -> 4 = 544 -> 68 blocks to skip... See GUID Partition Table at Wikipedia).

write () { 
    local i=$((${2:-64}/8)) o= v r
    r=$((i-1))
    for ((;i--;)) {
        printf -vv '\%03o' $(( ($1>>8*(0${3+-1}?i:r-i))&255 ))
        o+=$v
    }
    printf "$o"
}

此函数默认为64位,低端字节序.

This function default to 64 bits, little endian.

Usage: write <integer> [bits:64|32|16|8] [switchto big endian]

  • 具有两个参数,第二个参数必须为8163264之一,以作为生成的输出的位长.
  • 使用任何第3个哑元参数(即使是空字符串),功能也会切换为大端字节序.
    • With two parameter, second parameter must be one of 8, 16, 32 or 64, to be bit length of generated output.
    • With any dummy 3th parameter, (even empty string), function will switch to big endian.
    • .

      read64 foo < <(write -12345);echo $foo
      -12345
      

      ...

      借助内置的新版本的printf,您可以做很多事而不必派生($(...)),从而使脚本运行更快.

      With new version of printf built-in, you could do a lot without having to fork ($(...)) making so your script a lot faster.

      首先,让我们看看(通过使用seqsed)如何解析 hd输出:

      First let see (by using seq and sed) how to parse hd output:

      echo ;sed <(seq -f %02g 0 $(( COLUMNS-1 )) ) -ne '
          /0$/{s/^\(.*\)0$/\o0337\o033[A\1\o03380/;H;};
          /[1-9]$/{s/^.*\(.\)/\1/;H};
          ${x;s/\n//g;p}';hd < <(echo Hello good world!)
      0         1         2         3         4         5         6         7
      012345678901234567890123456789012345678901234567890123456789012345678901234567
      00000000  48 65 6c 6c 6f 20 67 6f  6f 64 20 77 6f 72 6c 64  |Hello good world|
      00000010  21 0a                                             |!.|
      00000012
      

      十六进制部分从第10列开始,到第56列结束,相隔3个字符,在第34列处有一个多余的空间.

      Were hexadecimal part begin at col 10 and end at col 56, spaced by 3 chars and having one extra space at col 34.

      因此,可以通过以下方式进行解析:

      So parsing this could by done by:

      while read line ;do
          for x in ${line:10:48};do
              printf -v x \\%o 0x$x
              printf $x
            done
        done < <( ls -l --color | hd )
      

      旧的原始帖子

      编辑2 (十六进制),您可以使用hd

      Old original post

      Edit 2 for Hexadecimal, you could use hd

      echo Hello world | hd
      00000000  48 65 6c 6c 6f 20 77 6f  72 6c 64 0a              |Hello world.|
      

      od

      echo Hello world | od -t x1 -t c
      0000000  48  65  6c  6c  6f  20  77  6f  72  6c  64  0a
                H   e   l   l   o       w   o   r   l   d  \n
      

      不久

      while IFS= read -r -n1 car;do [ "$car" ] && echo -n "$car" || echo ; done
      

      尝试:

      while IFS= read -rn1 c;do [ "$c" ]&&echo -n "$c"||echo;done < <(ls -l --color)
      

      说明:

      while IFS= read -rn1 car  # unset InputFieldSeparator so read every chars
          do [ "$car" ] &&      # Test if there is ``something''?
              echo -n "$car" || # then echo them
              echo              # Else, there is an end-of-line, so print one
        done
      

      修改;问题已需要十六进制值!?

      Edit; Question was edited: need hex values!?

      od -An -t x1 | while read line;do for char in $line;do echo $char;done ;done
      

      演示:

      od -An -t x1 < <(ls -l --color ) |        # Translate binary to 1 byte hex 
          while read line;do                    # Read line of HEX pairs
              for char in $line;do              # For each pair
                  printf "\x$char"              # Print translate HEX to binary
            done
        done
      

      演示2:我们同时提供了十六进制和二进制

      od -An -t x1 < <(ls -l --color ) |        # Translate binary to 1 byte hex 
          while read line;do                    # Read line of HEX pairs
              for char in $line;do              # For each pair
                  bin="$(printf "\x$char")"     # translate HEX to binary
                  dec=$(printf "%d" 0x$char)    # translate to decimal
                  [ $dec -lt 32  ] ||           # if caracter not printable
                  ( [ $dec -gt 128 ] &&         # change bin to a single dot.
                    [ $dec -lt 160 ] ) && bin="."
                  str="$str$bin" 
                  echo -n $char \               # Print HEX value and a space
                  ((i++))                       # count printed values
                  if [ $i -gt 15 ] ;then
                      i=0
                      echo "  -  $str"
                      str=""
                    fi
            done
        done
      

      2016年9月的新帖子:

      这在非常特殊的情况下可能很有用((我已经使用它们在较低级别上手动复制了两个磁盘之间的GPT分区,而没有安装/usr ...)

      ...但是只有一个字节,一个字节...(因为无法正确读取`char(0)',因此,正确读取它们的唯一方法是考虑文件结尾,如果未读取任何字符且未到达文件末尾,则读取的字符为char(0)).

      ... but only one byte, by one... (because `char(0)' couldn't be correctly read, the only way of reading them correctly is to consider end-of-file, where if no caracter is read and end of file not reached, then character read is a char(0)).

      这比一个有用的工具更能证明概念:有一个 版本的hd(hexdump).

      This is more a proof of concept than a relly usefull tool: there is a pure bash version of hd (hexdump).

      此版本使用最近的 bashisms bash v4.3或更高版本.

      This use recent bashisms, under bash v4.3 or higher.

      #!/bin/bash
      
      printf -v ascii \\%o {32..126}
      printf -v ascii "$ascii"
      
      printf -v cntrl %-20sE abtnvfr
      
      values=()
      todisplay=
      address=0
      printf -v fmt8 %8s
      fmt8=${fmt8// / %02x}
      
      while LANG=C IFS= read -r -d '' -n 1 char ;do
          if [ "$char" ] ;then
              printf -v char "%q" "$char"
              ((${#char}==1)) && todisplay+=$char || todisplay+=.
              case ${#char} in
               1|2 ) char=${ascii%$char*};values+=($((${#char}+32)));;
                 7 ) char=${char#*\'\\};values+=($((8#${char%\'})));;
                 5 ) char=${char#*\'\\};char=${cntrl%${char%\'}*};
                      values+=($((${#char}+7)));;
                 * ) echo >&2 ERROR: $char;;
              esac
            else
              values+=(0)
            fi
      

          if [ ${#values[@]} -gt 15 ] ;then
              printf "%08x $fmt8 $fmt8  |%s|\n" $address ${values[@]} "$todisplay"
              ((address+=16))
              values=() todisplay=
            fi
        done
      
      if [ "$values" ] ;then
              ((${#values[@]}>8))&&fmt="$fmt8 ${fmt8:0:(${#values[@]}%8)*5}"||
                  fmt="${fmt8:0:${#values[@]}*5}"
              printf "%08x $fmt%$((
                      50-${#values[@]}*3-(${#values[@]}>8?1:0)
                  ))s |%s|\n" $address ${values[@]} ''""'' "$todisplay"
      fi
      printf "%08x (%d chars read.)\n" $((address+${#values[@]})){,}
      

      您可以尝试/使用此功能,但不要尝试比较性能!

      You could try/use this, but don't try to compare performances!

      time hd < <(seq 1 10000|gzip)|wc
         1415   25480  111711
      real    0m0.020s
      user    0m0.008s
      sys     0m0.000s
      
      time ./hex.sh < <(seq 1 10000|gzip)|wc
         1415   25452  111669
      real    0m2.636s
      user    0m2.496s
      sys     0m0.048s
      

      相同的工作:hd为20毫秒,而我的bash script为2000毫秒.

      same job: 20ms for hd vs 2000ms for my bash script.

      ...但是如果您想读取文件头中的4个字节,甚至是硬盘中的扇区地址,这都可以完成工作...

      ... but if you wanna read 4 bytes in a file header or even a sector address in an hard drive, this could do the job...

      这篇关于在BASH中按字节读取文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆