阅读在制表符分隔文件的bash没有空字段崩溃 [英] read in bash on tab-delimited file without empty fields collapsing

查看:105
本文介绍了阅读在制表符分隔文件的bash没有空字段崩溃的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想读在bash多行制表符分隔的文件。格式是这样的空字段的预期。不幸的是,外壳字段分隔它们彼此相邻坍塌在一起,像这样:

I'm trying to read a multi-line tab-separated file in bash. The format is such that empty fields are expected. Unfortunately, the shell is collapsing together field separators which are next to each other, as so:

# IFS=$'\t'
# read one two three <<<$'one\t\tthree'
# printf '<%s> ' "$one" "$two" "$three"; printf '\n'
<one> <three> <>

...而不是所希望的输出&下一个(one)GT; &LT;&GT; &LT;三方式&gt;

可以这样而不诉诸独立的语言(如AWK)解决?

Can this be resolved without resorting to a separate language (such as awk)?

推荐答案

下面是一些细微的方法:

Here's an approach with some niceties:


  • 从任何地方成为主code进行伪二维数组(避免一个共同的问题,其中的数据是唯一的一个管道的一个阶段内可用)的输入数据。

  • 没有用awk,tr或其他外部progs的的

  • 的GET /把访问对隐藏多毛语法

  • 通过参数的匹配,而不是IFS适用于制表符分隔线=

在code。 file_data file_input 只是用于生成输入仿佛从脚本调用一个外部命令。 数据 COLS 可参数化的 GET 通话等,但这个剧本不走那么远。

The code. file_data and file_input are just for generating input as though from a external command called from the script. data and cols could be parameterized for the get and put calls, etc, but this script doesn't go that far.

#!/bin/bash

file_data=( $'\t\t'       $'\t\tbC'     $'\tcB\t'     $'\tdB\tdC'   \
            $'eA\t\t'     $'fA\t\tfC'   $'gA\tgB\t'   $'hA\thB\thC' )
file_input () { printf '%s\n' "${file_data[@]}" ; }  # simulated input file
delim=$'\t'

# the IFS=$'\n' has a side-effect of skipping blank lines; acceptable:
OIFS="$IFS" ; IFS=$'\n' ; oset="$-" ; set -f
lines=($(file_input))                    # read the "file"
set -"$oset" ; IFS="$OIFS" ; unset oset  # cleanup the environment mods.

# the read-in data has (rows * cols) fields, with cols as the stride:
data=()
cols=0
get () { local r=$1 c=$2 i ; (( i = cols * r + c )) ; echo "${data[$i]}" ; }
put () { local r=$1 c=$2 i ; (( i = cols * r + c )) ; data[$i]="$3" ; }

# convert the lines from input into the pseudo-2D data array:
i=0 ; row=0 ; col=0
for line in "${lines[@]}" ; do
    line="$line$delim"
    while [ -n "$line" ] ; do
        case "$line" in
            *${delim}*) data[$i]="${line%%${delim}*}" ; line="${line#*${delim}}" ;;
            *)          data[$i]="${line}"            ; line=                     ;;
        esac
        (( ++i ))
    done
    [ 0 = "$cols" ] && (( cols = i )) 
done
rows=${#lines[@]}

# output the data array as a matrix, using the get accessor
for    (( row=0 ; row < rows ; ++row )) ; do
   printf 'row %2d: ' $row
   for (( col=0 ; col < cols ; ++col )) ; do
       printf '%5s ' "$(get $row $col)"
   done
   printf '\n'
done

输出:

$ ./tabtest 
row  0:                   
row  1:                bC 
row  2:          cB       
row  3:          dB    dC 
row  4:    eA             
row  5:    fA          fC 
row  6:    gA    gB       
row  7:    hA    hB    hC 

这篇关于阅读在制表符分隔文件的bash没有空字段崩溃的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆