Shell读取*有时*去除尾随定界符 [英] Shell read *sometimes* strips trailing delimiter

查看:87
本文介绍了Shell读取*有时*去除尾随定界符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

要解析冒号分隔的字段,可以将read与自定义IFS一起使用:

To parse colon-delimited fields I can use read with a custom IFS:

$ echo 'foo.c:41:switch (color) {' | { IFS=: read file line text && echo "$file | $line | $text"; }
foo.c | 41 | switch (color) {

如果最后一个字段包含冒号,则没问题,冒号将保留.

If the last field contains colons, no problem, the colons are retained.

$ echo 'foo.c:42:case RED: //alert' | { IFS=: read file line text && echo "$file | $line | $text"; }
foo.c | 42 | case RED: //alert

尾部定界符也被保留...

A trailing delimiter is also retained...

$ echo 'foo.c:42:case RED: //alert:' | { IFS=: read file line text && echo "$file | $line | $text"; }
foo.c | 42 | case RED: //alert:

...除非它是 only 额外的定界符.然后将其剥离. 等等,

...Unless it's the only extra delimiter. Then it's stripped. Wait, what?

$ echo 'foo.c:42:case RED:' | { IFS=: read file line text && echo "$file | $line | $text"; }
foo.c | 42 | case RED

Bash,ksh93和破折号都可以做到这一点,所以我猜这是POSIX标准行为.

Bash, ksh93, and dash all do this, so I'm guessing it is POSIX standard behavior.

  1. 为什么会发生?
  2. 什么是最好的选择?

我想将上面的字符串解析为三个变量,并且我不想破坏第三字段中的任何文本.我以为read是应该走的路,但是现在我正在重新考虑.

I want to parse the strings above into three variables and I don't want to mangle any text in the third field. I had thought read was the way to go but now I'm reconsidering.

推荐答案

是的,这是标准行为(请参见 read规范字段拆分).少数外壳程序(至少基于ash,包括dash,基于pdkshzshyash)用于执行此操作,但用于zsh除外(当未处于POSIX模式时), busybox sh,其中大多数已更新为符合POSIX.

Yes, that's standard behaviour (see the read specification and Field Splitting). A few shells (ash-based including dash, pdksh-based, zsh, yash at least) used not to do it, but except for zsh (when not in POSIX mode), busybox sh, most of them have been updated for POSIX compliance.

以下内容相同:

$ var='a:b:c:' IFS=:
$ set -- $var; echo "$#"
3

(请参阅read的POSIX规范实际上是如何遵循 Field Splitting 机制的,其中a:b:c:被分为3个字段,因此对于IFS=: read -r a b c,存在的字段数与变量).

(see how the POSIX specification for read actually defers to the Field Splitting mechanism where a:b:c: is split into 3 fields, and so with IFS=: read -r a b c, there are as many fields as variables).

基本原理是,在ksh(POSIX规范所基于的基础)中,$IFS(最初在Bourne Shell中,内部字段分隔符)成为了字段定界符,我认为因此可以表示任何元素列表(不包含定界符).

The rationale is that in ksh (on which the POSIX spec is based) $IFS (initially in the Bourne shell the internal field separator) became a field delimiter, I think so any list of elements (not containing the delimiter) could be represented.

$IFS分隔符时,不能代表一个空元素的列表(""分为0个元素的列表,":"分为两个元素的列表空元素¹).当它是定界符时,可以用""表示零元素的列表,或者用":"表示一个空元素的列表,或者用"::"表示两个空元素的列表.

When $IFS is a separator, one can't represent a list of one empty element ("" is split into a list of 0 element, ":" into a list of two empty elements¹). When it's a delimiter, you can express a list of zero element with "", or one empty element with ":", or two empty elements with "::".

不幸的是,$IFS的最常见用法之一是拆分$PATH.像/bin:/usr/bin:这样的$PATH应该被拆分为"/bin""/usr/bin""",而不仅仅是"/bin""/usr/bin".

It's a bit unfortunate as one of the most common usages of $IFS is to split $PATH. And a $PATH like /bin:/usr/bin: is meant to be split into "/bin", "/usr/bin", "", not just "/bin" and "/usr/bin".

现在,使用POSIX外壳程序(但不是所有外壳程序都兼容),可以在参数扩展时进行单词拆分,可以解决以下问题:

Now, with POSIX shells (but not all shells are compliant in that regard), for word splitting upon parameter expansion, that can be worked around with:

IFS=:; set -o noglob
for dir in $PATH""; do
  something with "${dir:-.}"
done

结尾的""确保如果$PATH以结尾的:结尾,则会添加一个额外的空元素.而且,空的$PATH应当被视为一个空元素.

That trailing "" makes sure that if $PATH ends in a trailing :, an extra empty element is added. And also that an empty $PATH is treated as one empty element as it should be.

但是,该方法不能用于read.

That approach can't be used for read though.

切换到zsh的快捷方式,除了插入一个额外的:并随后将其删除外,没有其他容易的解决方法,例如:

Short of switching to zsh, there's no easy work around other than inserting an extra : and remove it afterwards like:

echo a:b:c: | sed 's/:/::/2' | { IFS=: read -r x y z; z=${z#:}; echo "$z"; }

或(便携式程度较低):

Or (less portable):

echo a:b:c: | paste -d: - /dev/null | { IFS=: read -r x y z; z=${z%:}; echo "$z"; }

我还添加了使用read时通常需要的-r.

I've also added the -r which you generally want when using read.

最有可能在这里您想使用诸如sed/awk/perl之类的适当文本处理实用程序来代替编写在read 周围复杂且可能效率低下的代码,而并非为此设计的.

Most likely here you'd want to use a proper text processing utility like sed/awk/perl instead of writing convoluted and probably inefficient code around read which has not been designed for that.

¹尽管在Bourne外壳程序中,由于IFS空格字符和IFS非非空格字符之间没有区别,因此仍被分成零个元素,ksh也添加了一些内容

这篇关于Shell读取*有时*去除尾随定界符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆