在空格分隔的文件中以bash读取,没有空字段折叠 [英] read in bash on whitespace-delimited file without empty fields collapsing
问题描述
我正在尝试在bash中读取多行制表符分隔的文件.格式应为空字段.不幸的是,外壳将彼此相邻的场分隔符折叠在一起,就像这样:
I'm trying to read a multi-line tab-separated file in bash. The format is such that empty fields are expected. Unfortunately, the shell is collapsing together field separators which are next to each other, as so:
# IFS=$'\t'
# read one two three <<<$'one\t\tthree'
# printf '<%s> ' "$one" "$two" "$three"; printf '\n'
<one> <three> <>
...与< one>的期望输出相反<>< three>
.
可以解决此问题而无需求助于另一种语言(例如awk)吗?
Can this be resolved without resorting to a separate language (such as awk)?
推荐答案
这里有一些技巧:
- 从任何地方输入的数据都会变成主代码中的伪2D数组(避免了一个常见问题,即数据仅在管道的一个阶段内可用).
- 不使用awk,tr或其他外部程序
- 获取/放置访问器对以隐藏更短的语法
- 使用参数匹配而不是IFS = 在制表符分隔的行上工作
- input data from wherever becomes a pseudo-2D array in the main code (avoiding a common problem where the data is only available within one stage of a pipeline).
- no use of awk, tr, or other external progs
- a get/put accessor pair to hide the hairier syntax
- works on tab-delimited lines by using param matching instead of IFS=
代码. file_data
和 file_input
仅用于生成输入,就像从脚本中调用的外部命令一样.可以为 get
和 put
调用等参数化 data
和 cols
,但是此脚本不会执行那么远.
The code. file_data
and file_input
are just for generating input as though from a external command called from the script. data
and cols
could be parameterized for the get
and put
calls, etc, but this script doesn't go that far.
#!/bin/bash
file_data=( $'\t\t' $'\t\tbC' $'\tcB\t' $'\tdB\tdC' \
$'eA\t\t' $'fA\t\tfC' $'gA\tgB\t' $'hA\thB\thC' )
file_input () { printf '%s\n' "${file_data[@]}" ; } # simulated input file
delim=$'\t'
# the IFS=$'\n' has a side-effect of skipping blank lines; acceptable:
OIFS="$IFS" ; IFS=$'\n' ; oset="$-" ; set -f
lines=($(file_input)) # read the "file"
set -"$oset" ; IFS="$OIFS" ; unset oset # cleanup the environment mods.
# the read-in data has (rows * cols) fields, with cols as the stride:
data=()
cols=0
get () { local r=$1 c=$2 i ; (( i = cols * r + c )) ; echo "${data[$i]}" ; }
put () { local r=$1 c=$2 i ; (( i = cols * r + c )) ; data[$i]="$3" ; }
# convert the lines from input into the pseudo-2D data array:
i=0 ; row=0 ; col=0
for line in "${lines[@]}" ; do
line="$line$delim"
while [ -n "$line" ] ; do
case "$line" in
*${delim}*) data[$i]="${line%%${delim}*}" ; line="${line#*${delim}}" ;;
*) data[$i]="${line}" ; line= ;;
esac
(( ++i ))
done
[ 0 = "$cols" ] && (( cols = i ))
done
rows=${#lines[@]}
# output the data array as a matrix, using the get accessor
for (( row=0 ; row < rows ; ++row )) ; do
printf 'row %2d: ' $row
for (( col=0 ; col < cols ; ++col )) ; do
printf '%5s ' "$(get $row $col)"
done
printf '\n'
done
输出:
$ ./tabtest
row 0:
row 1: bC
row 2: cB
row 3: dB dC
row 4: eA
row 5: fA fC
row 6: gA gB
row 7: hA hB hC
这篇关于在空格分隔的文件中以bash读取,没有空字段折叠的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!