阅读在制表符分隔文件的bash没有空字段崩溃 [英] read in bash on tab-delimited file without empty fields collapsing
问题描述
我想读在bash多行制表符分隔的文件。格式是这样的空字段的预期。不幸的是,外壳字段分隔它们彼此相邻坍塌在一起,像这样:
I'm trying to read a multi-line tab-separated file in bash. The format is such that empty fields are expected. Unfortunately, the shell is collapsing together field separators which are next to each other, as so:
# IFS=$'\t'
# read one two three <<<$'one\t\tthree'
# printf '<%s> ' "$one" "$two" "$three"; printf '\n'
<one> <three> <>
...而不是所希望的输出&下一个(one)GT; &LT;&GT; &LT;三方式&gt;
可以这样而不诉诸独立的语言(如AWK)解决?
Can this be resolved without resorting to a separate language (such as awk)?
推荐答案
下面是一些细微的方法:
Here's an approach with some niceties:
- 从任何地方成为主code进行伪二维数组(避免一个共同的问题,其中的数据是唯一的一个管道的一个阶段内可用)的输入数据。
- 没有用awk,tr或其他外部progs的的
- 的GET /把访问对隐藏多毛语法
- 通过参数的匹配,而不是IFS适用于制表符分隔线=
在code。 file_data code>和
file_input
只是用于生成输入仿佛从脚本调用一个外部命令。 数据
和 COLS
可参数化的 GET
和把
通话等,但这个剧本不走那么远。
The code. file_data
and file_input
are just for generating input as though from a external command called from the script. data
and cols
could be parameterized for the get
and put
calls, etc, but this script doesn't go that far.
#!/bin/bash
file_data=( $'\t\t' $'\t\tbC' $'\tcB\t' $'\tdB\tdC' \
$'eA\t\t' $'fA\t\tfC' $'gA\tgB\t' $'hA\thB\thC' )
file_input () { printf '%s\n' "${file_data[@]}" ; } # simulated input file
delim=$'\t'
# the IFS=$'\n' has a side-effect of skipping blank lines; acceptable:
OIFS="$IFS" ; IFS=$'\n' ; oset="$-" ; set -f
lines=($(file_input)) # read the "file"
set -"$oset" ; IFS="$OIFS" ; unset oset # cleanup the environment mods.
# the read-in data has (rows * cols) fields, with cols as the stride:
data=()
cols=0
get () { local r=$1 c=$2 i ; (( i = cols * r + c )) ; echo "${data[$i]}" ; }
put () { local r=$1 c=$2 i ; (( i = cols * r + c )) ; data[$i]="$3" ; }
# convert the lines from input into the pseudo-2D data array:
i=0 ; row=0 ; col=0
for line in "${lines[@]}" ; do
line="$line$delim"
while [ -n "$line" ] ; do
case "$line" in
*${delim}*) data[$i]="${line%%${delim}*}" ; line="${line#*${delim}}" ;;
*) data[$i]="${line}" ; line= ;;
esac
(( ++i ))
done
[ 0 = "$cols" ] && (( cols = i ))
done
rows=${#lines[@]}
# output the data array as a matrix, using the get accessor
for (( row=0 ; row < rows ; ++row )) ; do
printf 'row %2d: ' $row
for (( col=0 ; col < cols ; ++col )) ; do
printf '%5s ' "$(get $row $col)"
done
printf '\n'
done
输出:
$ ./tabtest
row 0:
row 1: bC
row 2: cB
row 3: dB dC
row 4: eA
row 5: fA fC
row 6: gA gB
row 7: hA hB hC
这篇关于阅读在制表符分隔文件的bash没有空字段崩溃的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!