在空格分隔的文件中以bash读取,没有空字段折叠 [英] read in bash on whitespace-delimited file without empty fields collapsing

查看:55
本文介绍了在空格分隔的文件中以bash读取,没有空字段折叠的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在bash中读取多行制表符分隔的文件.格式应为空字段.不幸的是,外壳将彼此相邻的场分隔符折叠在一起,就像这样:

I'm trying to read a multi-line tab-separated file in bash. The format is such that empty fields are expected. Unfortunately, the shell is collapsing together field separators which are next to each other, as so:

# IFS=$'\t'
# read one two three <<<$'one\t\tthree'
# printf '<%s> ' "$one" "$two" "$three"; printf '\n'
<one> <three> <>

...与< one>的期望输出相反<>< three> .

可以解决此问题而无需求助于另一种语言(例如awk)吗?

Can this be resolved without resorting to a separate language (such as awk)?

推荐答案

这里有一些技巧:

  • 从任何地方输入的数据都会变成主代码中的伪2D数组(避免了一个常见问题,即数据仅在管道的一个阶段内可用).
  • 不使用awk,tr或其他外部程序
  • 获取/放置访问器对以隐藏更短的语法
  • 使用参数匹配而不是IFS =
  • 在制表符分隔的行上工作
  • input data from wherever becomes a pseudo-2D array in the main code (avoiding a common problem where the data is only available within one stage of a pipeline).
  • no use of awk, tr, or other external progs
  • a get/put accessor pair to hide the hairier syntax
  • works on tab-delimited lines by using param matching instead of IFS=

代码. file_data file_input 仅用于生成输入,就像从脚本中调用的外部命令一样.可以为 get put 调用等参数化 data cols ,但是此脚本不会执行那么远.

The code. file_data and file_input are just for generating input as though from a external command called from the script. data and cols could be parameterized for the get and put calls, etc, but this script doesn't go that far.

#!/bin/bash

file_data=( $'\t\t'       $'\t\tbC'     $'\tcB\t'     $'\tdB\tdC'   \
            $'eA\t\t'     $'fA\t\tfC'   $'gA\tgB\t'   $'hA\thB\thC' )
file_input () { printf '%s\n' "${file_data[@]}" ; }  # simulated input file
delim=$'\t'

# the IFS=$'\n' has a side-effect of skipping blank lines; acceptable:
OIFS="$IFS" ; IFS=$'\n' ; oset="$-" ; set -f
lines=($(file_input))                    # read the "file"
set -"$oset" ; IFS="$OIFS" ; unset oset  # cleanup the environment mods.

# the read-in data has (rows * cols) fields, with cols as the stride:
data=()
cols=0
get () { local r=$1 c=$2 i ; (( i = cols * r + c )) ; echo "${data[$i]}" ; }
put () { local r=$1 c=$2 i ; (( i = cols * r + c )) ; data[$i]="$3" ; }

# convert the lines from input into the pseudo-2D data array:
i=0 ; row=0 ; col=0
for line in "${lines[@]}" ; do
    line="$line$delim"
    while [ -n "$line" ] ; do
        case "$line" in
            *${delim}*) data[$i]="${line%%${delim}*}" ; line="${line#*${delim}}" ;;
            *)          data[$i]="${line}"            ; line=                     ;;
        esac
        (( ++i ))
    done
    [ 0 = "$cols" ] && (( cols = i )) 
done
rows=${#lines[@]}

# output the data array as a matrix, using the get accessor
for    (( row=0 ; row < rows ; ++row )) ; do
   printf 'row %2d: ' $row
   for (( col=0 ; col < cols ; ++col )) ; do
       printf '%5s ' "$(get $row $col)"
   done
   printf '\n'
done

输出:

$ ./tabtest 
row  0:                   
row  1:                bC 
row  2:          cB       
row  3:          dB    dC 
row  4:    eA             
row  5:    fA          fC 
row  6:    gA    gB       
row  7:    hA    hB    hC 

这篇关于在空格分隔的文件中以bash读取,没有空字段折叠的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆