在bash中将字符串拆分为数组 [英] Split string into array in bash
问题描述
我正在寻找一种方法,可以将bash中的字符串分割成定界符字符串,并将其放置在数组中.
I am looking for a way to split a string in bash over a delimiter string, and place the parts in an array.
简单情况:
#!/bin/bash
b="aaaaa/bbbbb/ddd/ffffff"
echo "simple string: $b"
IFS='/' b_split=($b)
echo ;
echo "split"
for i in ${b_split[@]}
do
echo "------ new part ------"
echo "$i"
done
提供输出
simple string: aaaaa/bbbbb/ddd/ffffff
split
------ new part ------
aaaaa
------ new part ------
bbbbb
------ new part ------
ddd
------ new part ------
ffffff
更复杂的情况:
#!/bin/bash
c=$(echo "AA=A"; echo "B=BB"; echo "======="; echo "C==CC"; echo "DD=D"; echo "======="; echo "EEE"; echo "FF";)
echo "more complex string"
echo "$c";
echo ;
echo "split";
IFS='=======' c_split=($c) ;# <---- LINE TO BE CHANGED
for i in ${c_split[@]}
do
echo "------ new part ------"
echo "$i"
done
给出输出:
more complex string
AA=A
B=BB
=======
C==CC
DD=D
=======
EEE
FF
split
------ new part ------
AA
------ new part ------
A
B
------ new part ------
BB
------ new part ------
------ new part ------
------ new part ------
------ new part ------
------ new part ------
------ new part ------
------ new part ------
C
------ new part ------
------ new part ------
CC
DD
------ new part ------
D
------ new part ------
------ new part ------
------ new part ------
------ new part ------
------ new part ------
------ new part ------
------ new part ------
EEE
FF
我希望第二个输出像
------ new part ------
AA=A
B=BB
------ new part ------
C==CC
DD=D
------ new part ------
EEE
FF
即将字符串分割成一个字符顺序,而不是一个.我该怎么办?
I.e. to split the string on a sequence of characters, instead of one. How can I do this?
我正在寻找仅会在第二个脚本中修改此行的答案:
I am looking for an answer that would only modify this line in the second script:
IFS='=======' c_split=($c) ;# <---- LINE TO BE CHANGED
推荐答案
IFS
消除歧义
IFS
表示输入字段分隔符,为 list of characters that could be used as separators
.
IFS
disambiguation
IFS
mean Input Field Separators, as list of characters that could be used as separators
.
默认情况下,此设置为
\t\n
,表示空格,制表 和/或任何数量(大于零) 换行符可以是一个 separator
.
By default, this is set to
\t\n
, meaning that any number (greater than zero) of space, tabulation and/or newline could be one separator
.
所以字符串:
" blah foo=bar
baz "
前导和尾随分隔符将被忽略,该字符串仅包含3个
部分:blah
,foo=bar
和baz
.
Leading and trailing separators would be ignored and this string will contain only 3
parts: blah
, foo=bar
and baz
.
如果您知道在字符串中未使用有效的字段分隔符,则可以使用IFS
分隔字符串.
Splitting a string using IFS
is possible if you know a valid field separator not used in your string.
OIFS="$IFS"
IFS='§'
c=$'AA=A\nB=BB\n=======\nC==CC\nDD=D\n=======\nEEE\nFF'
c_split=(${c//=======/§})
IFS="$OIFS"
printf -- "------ new part ------\n%s\n" "${c_split[@]}"
------ new part ------
AA=A
B=BB
------ new part ------
C==CC
DD=D
------ new part ------
EEE
FF
但这仅在字符串不包含§
时起作用.
But this work only while string do not contain §
.
您可以使用其他字符,例如IFS=$'\026';c_split=(${c//=======/$'\026'})
,但是无论如何,这可能会涉及到臭虫.
You could use another character, like IFS=$'\026';c_split=(${c//=======/$'\026'})
but anyway this may involve furter bugs.
您可以浏览字符映射表以查找不在您的字符串中的人:
You could browse character maps for finding one who's not in your string:
myIfs=""
for i in {1..255};do
printf -v char "$(printf "\\\%03o" $i)"
[ "$c" == "${c#*$char}" ] && myIfs="$char" && break
done
if ! [ "$myIFS" ] ;then
echo no split char found, could not do the job, sorry.
exit 1
fi
但我发现此解决方案有些过分.
but I find this solution a little overkill.
在 bash 的问题下,我们可以使用这种bashism :
Under bash, we could use this bashism:
b="aaaaa/bbbbb/ddd/ffffff"
b_split=(${b//// })
实际上,此语法${varname//
将启动转换(用/
分隔),将所有/
出现的位置替换为空格,之前,将其分配给数组
b_split
.
In fact, this syntaxe ${varname//
will initiate a translation (delimited by /
) replacing all occurences of /
by a space , before assigning it to an array
b_split
.
当然,它仍然使用IFS
并在空格上分割数组.
Of course, this still use IFS
and split array on spaces.
这不是最好的方法,但是可以处理特定的情况.
This is not the best way, but could work with specific cases.
您甚至可以在拆分之前丢下多余的空格:
You could even drop unwanted spaces before splitting:
b='12 34 / 1 3 5 7 / ab'
b1=${b// }
b_split=(${b1//// })
printf "<%s>, " "${b_split[@]}" ;echo
<12>, <34>, <1>, <3>, <5>, <7>, <ab>,
或交换他们...
b1=${b// /§}
b_split=(${b1//// })
printf "<%s>, " "${b_split[@]//§/ }" ;echo
<12 34 >, < 1 3 5 7 >, < ab>,
strings
上的分割线:
因此,您不必不要使用IFS
来表示您的意思,但 bash 确实具有不错的功能:
Splitting line on strings
:
So you have to not use IFS
for your meaning, but bash do have nice features:
#!/bin/bash
c=$'AA=A\nB=BB\n=======\nC==CC\nDD=D\n=======\nEEE\nFF'
echo "more complex string"
echo "$c";
echo ;
echo "split";
mySep='======='
while [ "$c" != "${c#*$mySep}" ];do
echo "------ new part ------"
echo "${c%%$mySep*}"
c="${c#*$mySep}"
done
echo "------ last part ------"
echo "$c"
让我们看看:
more complex string
AA=A
B=BB
=======
C==CC
DD=D
=======
EEE
FF
split
------ new part ------
AA=A
B=BB
------ new part ------
C==CC
DD=D
------ last part ------
EEE
FF
注意:不删除开头和结尾的换行符.如果需要,您可以:
Nota: Leading and trailing newlines are not deleted. If this is needed, you could:
mySep=$'\n=======\n'
而不是简单的=======
.
或者您可以重写split循环以明确地将其排除在外:
Or you could rewrite split loop for keeping explicitely this out:
mySep=$'======='
while [ "$c" != "${c#*$mySep}" ];do
echo "------ new part ------"
part="${c%%$mySep*}"
part="${part##$'\n'}"
echo "${part%%$'\n'}"
c="${c#*$mySep}"
done
echo "------ last part ------"
c=${c##$'\n'}
echo "${c%%$'\n'}"
任何情况下,这都符合SO问题所要求的(:和他的样本:)
Any case, this match what SO question asked for (: and his sample :)
------ new part ------
AA=A
B=BB
------ new part ------
C==CC
DD=D
------ last part ------
EEE
FF
最后创建一个 array
#!/bin/bash
c=$'AA=A\nB=BB\n=======\nC==CC\nDD=D\n=======\nEEE\nFF'
echo "more complex string"
echo "$c";
echo ;
echo "split";
mySep=$'======='
export -a c_split
while [ "$c" != "${c#*$mySep}" ];do
part="${c%%$mySep*}"
part="${part##$'\n'}"
c_split+=("${part%%$'\n'}")
c="${c#*$mySep}"
done
c=${c##$'\n'}
c_split+=("${c%%$'\n'}")
for i in "${c_split[@]}"
do
echo "------ new part ------"
echo "$i"
done
很好地做到这一点:
more complex string
AA=A
B=BB
=======
C==CC
DD=D
=======
EEE
FF
split
------ new part ------
AA=A
B=BB
------ new part ------
C==CC
DD=D
------ new part ------
EEE
FF
一些解释:
-
export -a var
将var
定义为数组并在子级中共享它们 -
${variablename%string*}
,${variablename%%string*}
导致变量名的左侧,直到但没有 string .一个%
表示最后一次出现字符串,而%%
表示所有出现.如果找不到 string ,则会返回完整的 variablename . -
${variablename#*string}
,以相反的方式进行操作:从中返回变量名的最后一部分,但不包含 string .一个#
平均首次出现和两个##
男性全部出现. export -a var
to definevar
as an array and share them in childs${variablename%string*}
,${variablename%%string*}
result in the left part of variablename, upto but without string. One%
mean last occurence of string and%%
for all occurences. Full variablename is returned is string not found.${variablename#*string}
, do same in reverse way: return last part of variablename from but without string. One#
mean first occurence and two##
man all occurences.
Some explanations:
Nota替换中,字符*
是小丑,表示任意数量的任何字符.
Nota in replacement, character *
is a joker mean any number of any character.
命令echo "${c%%$'\n'}"
将回显变量 c ,但字符串末尾没有任何换行符.
The command echo "${c%%$'\n'}"
would echo variable c but without any number of newline at end of string.
因此,如果变量包含Hello WorldZorGluBHello youZorGluBI'm happy
,
variable="Hello WorldZorGluBHello youZorGluBI'm happy"
$ echo ${variable#*ZorGluB}
Hello youZorGlubI'm happy
$ echo ${variable##*ZorGluB}
I'm happy
$ echo ${variable%ZorGluB*}
Hello WorldZorGluBHello you
$ echo ${variable%%ZorGluB*}
Hello World
$ echo ${variable%%ZorGluB}
Hello WorldZorGluBHello youZorGluBI'm happy
$ echo ${variable%happy}
Hello WorldZorGluBHello youZorGluBI'm
$ echo ${variable##* }
happy
所有这些内容都在联机帮助页中进行了说明:
All this is explained in the manpage:
$ man -Len -Pless\ +/##word bash
$ man -Len -Pless\ +/%%word bash
$ man -Len -Pless\ +/^\\\ *export\\\ .*word bash
分步循环:
分隔符:
Step by step, the splitting loop:
The separator:
mySep=$'======='
将c_split
声明为 array (可以与孩子共享)
Declaring c_split
as an array (and could be shared with childs)
export -a c_split
变量 c 确实包含至少一个mySep
While variable c do contain at least one occurence of mySep
while [ "$c" != "${c#*$mySep}" ];do
从第一个mySep
到字符串的结尾
截断 c 并分配给part
.
Trunc c from first mySep
to end of string and assign to part
.
part="${c%%$mySep*}"
删除主要的换行符
part="${part##$'\n'}"
删除尾随的换行符,并将结果作为新的数组元素添加到c_split
.
Remove trailing newlines and add result as a new array element to c_split
.
c_split+=("${part%%$'\n'}")
重置 c ,直到剩下的mySep
都删除了字符串的其余部分
Reassing c whith the rest of string when left upto mySep
is removed
c="${c#*$mySep}"
完成;-)
done
删除主要的换行符
c=${c##$'\n'}
删除尾随的换行符,并将结果作为新的数组元素添加到c_split
.
Remove trailing newlines and add result as a new array element to c_split
.
c_split+=("${c%%$'\n'}")
进入功能:
ssplit() {
local string="$1" array=${2:-ssplited_array} delim="${3:- }" pos=0
while [ "$string" != "${string#*$delim}" ];do
printf -v $array[pos++] "%s" "${string%%$delim*}"
string="${string#*$delim}"
done
printf -v $array[pos] "%s" "$string"
}
用法:
ssplit "<quoted string>" [array name] [delimiter string]
其中数组名称默认为$splitted_array
,分隔符为单个空格.
where array name is $splitted_array
by default and delimiter is one single space.
您可以使用:
c=$'AA=A\nB=BB\n=======\nC==CC\nDD=D\n=======\nEEE\nFF'
ssplit "$c" c_split $'\n=======\n'
printf -- "--- part ----\n%s\n" "${c_split[@]}"
--- part ----
AA=A
B=BB
--- part ----
C==CC
DD=D
--- part ----
EEE
FF
这篇关于在bash中将字符串拆分为数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!