Shell读取*有时*去除尾随定界符 [英] Shell read *sometimes* strips trailing delimiter
问题描述
要解析冒号分隔的字段,可以将read
与自定义IFS
一起使用:
To parse colon-delimited fields I can use read
with a custom IFS
:
$ echo 'foo.c:41:switch (color) {' | { IFS=: read file line text && echo "$file | $line | $text"; }
foo.c | 41 | switch (color) {
如果最后一个字段包含冒号,则没问题,冒号将保留.
If the last field contains colons, no problem, the colons are retained.
$ echo 'foo.c:42:case RED: //alert' | { IFS=: read file line text && echo "$file | $line | $text"; }
foo.c | 42 | case RED: //alert
尾部定界符也被保留...
A trailing delimiter is also retained...
$ echo 'foo.c:42:case RED: //alert:' | { IFS=: read file line text && echo "$file | $line | $text"; }
foo.c | 42 | case RED: //alert:
...除非它是 only 额外的定界符.然后将其剥离. 等等,
...Unless it's the only extra delimiter. Then it's stripped. Wait, what?
$ echo 'foo.c:42:case RED:' | { IFS=: read file line text && echo "$file | $line | $text"; }
foo.c | 42 | case RED
Bash,ksh93和破折号都可以做到这一点,所以我猜这是POSIX标准行为.
Bash, ksh93, and dash all do this, so I'm guessing it is POSIX standard behavior.
- 为什么会发生?
- 什么是最好的选择?
我想将上面的字符串解析为三个变量,并且我不想破坏第三字段中的任何文本.我以为read
是应该走的路,但是现在我正在重新考虑.
I want to parse the strings above into three variables and I don't want to mangle any text in the third field. I had thought read
was the way to go but now I'm reconsidering.
推荐答案
是的,这是标准行为(请参见 read
规范和字段拆分).少数外壳程序(至少基于ash
,包括dash
,基于pdksh
,zsh
,yash
)用于执行此操作,但用于zsh
除外(当未处于POSIX模式时), busybox
sh,其中大多数已更新为符合POSIX.
Yes, that's standard behaviour (see the read
specification and Field Splitting). A few shells (ash
-based including dash
, pdksh
-based, zsh
, yash
at least) used not to do it, but except for zsh
(when not in POSIX mode), busybox
sh, most of them have been updated for POSIX compliance.
以下内容相同:
$ var='a:b:c:' IFS=:
$ set -- $var; echo "$#"
3
(请参阅read
的POSIX规范实际上是如何遵循 Field Splitting 机制的,其中a:b:c:
被分为3个字段,因此对于IFS=: read -r a b c
,存在的字段数与变量).
(see how the POSIX specification for read
actually defers to the Field Splitting mechanism where a:b:c:
is split into 3 fields, and so with IFS=: read -r a b c
, there are as many fields as variables).
基本原理是,在ksh
(POSIX规范所基于的基础)中,$IFS
(最初在Bourne Shell中,内部字段分隔符)成为了字段定界符,我认为因此可以表示任何元素列表(不包含定界符).
The rationale is that in ksh
(on which the POSIX spec is based) $IFS
(initially in the Bourne shell the internal field separator) became a field delimiter, I think so any list of elements (not containing the delimiter) could be represented.
当$IFS
是分隔符时,不能代表一个空元素的列表(""
分为0个元素的列表,":"
分为两个元素的列表空元素¹).当它是定界符时,可以用""
表示零元素的列表,或者用":"
表示一个空元素的列表,或者用"::"
表示两个空元素的列表.
When $IFS
is a separator, one can't represent a list of one empty element (""
is split into a list of 0 element, ":"
into a list of two empty elements¹). When it's a delimiter, you can express a list of zero element with ""
, or one empty element with ":"
, or two empty elements with "::"
.
不幸的是,$IFS
的最常见用法之一是拆分$PATH
.像/bin:/usr/bin:
这样的$PATH
应该被拆分为"/bin"
,"/usr/bin"
,""
,而不仅仅是"/bin"
和"/usr/bin"
.
It's a bit unfortunate as one of the most common usages of $IFS
is to split $PATH
. And a $PATH
like /bin:/usr/bin:
is meant to be split into "/bin"
, "/usr/bin"
, ""
, not just "/bin"
and "/usr/bin"
.
现在,使用POSIX外壳程序(但不是所有外壳程序都兼容),可以在参数扩展时进行单词拆分,可以解决以下问题:
Now, with POSIX shells (but not all shells are compliant in that regard), for word splitting upon parameter expansion, that can be worked around with:
IFS=:; set -o noglob
for dir in $PATH""; do
something with "${dir:-.}"
done
结尾的""
确保如果$PATH
以结尾的:
结尾,则会添加一个额外的空元素.而且,空的$PATH
应当被视为一个空元素.
That trailing ""
makes sure that if $PATH
ends in a trailing :
, an extra empty element is added. And also that an empty $PATH
is treated as one empty element as it should be.
但是,该方法不能用于read
.
That approach can't be used for read
though.
切换到zsh
的快捷方式,除了插入一个额外的:
并随后将其删除外,没有其他容易的解决方法,例如:
Short of switching to zsh
, there's no easy work around other than inserting an extra :
and remove it afterwards like:
echo a:b:c: | sed 's/:/::/2' | { IFS=: read -r x y z; z=${z#:}; echo "$z"; }
或(便携式程度较低):
Or (less portable):
echo a:b:c: | paste -d: - /dev/null | { IFS=: read -r x y z; z=${z%:}; echo "$z"; }
我还添加了使用read
时通常需要的-r
.
I've also added the -r
which you generally want when using read
.
最有可能在这里您想使用诸如sed
/awk
/perl
之类的适当文本处理实用程序来代替编写在read
周围复杂且可能效率低下的代码,而并非为此设计的.
Most likely here you'd want to use a proper text processing utility like sed
/awk
/perl
instead of writing convoluted and probably inefficient code around read
which has not been designed for that.
¹尽管在Bourne外壳程序中,由于IFS空格字符和IFS非非空格字符之间没有区别,因此仍被分成零个元素,ksh也添加了一些内容
这篇关于Shell读取*有时*去除尾随定界符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!