是否已在进程替换上跳过/忽略NUL字节标准化了? [英] Is skipping/ignoring NUL bytes on process substitution standardized?

查看:171
本文介绍了是否已在进程替换上跳过/忽略NUL字节标准化了?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

执行摘要

在进行进程替换时,shell跳过NUL字节是否是标准行为?

Is it standard behavior that shells skip over NUL bytes when doing process substitution?

例如,执行

printf '\0abc' | read value && echo $value

将产生abc.即使printf输出的十六进制转储明显显示了它的输出,也会跳过NUL值.

will yield abc. The NUL value is skipped, even though the hexdump of the printf output shows it's clearly being output.

我的第一个念头是"分词".但是,当使用实际的流程替换时

My first thought was "word splitting". However, when using an actual process substitution

value=$(printf '\0abc')

结果相似,并且=不执行分词.

the results are similar and = does not perform word splitting.

长篇故事

在寻找

While searching for the proper answer for this question, I realized that at least three of the shell implementation (ash, zsh, and bash) I am reasonably familiar with will ignore a NUL character when reading the value from process substitution into a variable.

发生这种情况时,流水线中的确切点似乎有所不同,但结果始终是NUL字节被丢弃,就好像它从来没有出现过一样.

The exact point in the pipeline when this happens seems to be different, but the result is consistently that a NUL byte gets dropped as if it was never there in the first place.

我检查了一些实现,这似乎是正常行为.

I have checked with some of the implementations, and well, this seems to be normal behavior.

ash

ash will skip over '\0' on input, but it is not clear from the code if this is pure coincidence or intended behavior:

if (lastc != '\0') {
    [...]
}

bash源代码包含显式,尽管#ifdef会警告告诉我们它在进程替换时跳过了NUL值:

The bash source code contains an explicit, albeit #ifdef'd warning telling us that it skipped a NUL value on process substitution:

#if 0
      internal_warning ("read_comsub: ignored null byte in input");
#endif

我不太确定zsh的行为.它会将'\0'识别为元字符(由内部imeta()函数定义),并添加特殊的Meta替代字符,并在输入字符上设置位#5,实际上是取消元数据,这也使'\0'进入了空格' ')

I'm not so sure about zsh's behaviour. It recognizes '\0'as a meta character (as defined by the internal imeta() function) and prepends a special Meta surrogate character and sets bit #5 on the input character, essentially unmetaing it, which makes also makes '\0' into a space ' ')

if (imeta(c)) {
    *ptr++ = Meta;
    c ^= 32;
    cnt++;
}

这似乎稍后会被删除,因为没有证据表明上述printf命令中的value包含一个元字符.由于我不熟悉zsh的内部原理,因此请大加帮助.还请注意无副作用声明.

This seems to get stripped later because there is no evidence that value in the above printf command contains a meta character. Take this with a large helping of salt, since I'm not to familiar with zsh's internals. Also note the side effect free statements.

请注意,zsh还允许您在IFS中包含NUL(元转义)(例如,可以在不使用xargs -0的情况下将单词拆分为find -print0).因此,printf '\0abc' | read valuevalue=$(printf '\0abc')会根据IFS的值产生不同的结果(read进行字段分割).

Note that zsh also allows you to include NUL (meta-escaped) in IFS (making it possible to e.g. word-split find -print0 without xargs -0). Thus printf '\0abc' | read value and value=$(printf '\0abc') should yield different results depending on the value of IFS (read does field splitting).

推荐答案

所有现存的POSIX外壳都使用C字符串(以NUL终止),而不是Pascal字符串(将其长度作为单独的元数据携带,因此可以包含NUL).因此,它们不可能在字符串内容中包含NUL. Bourne Shell和ksh对POSIX sh标准都有重大影响,这一点尤其明显.

All extant POSIX shells use C strings (NUL-terminated), not Pascal strings (carrying their length as separate metadata, thus able to contain NULs). Thus, they can't possibly contain NULs in string contents. This was notably true of the Bourne Shell and ksh, both major influences to the POSIX sh standard.

该规范允许shell在此处以实现定义的方式运行;在不知道特定的shell和发行版为目标的情况下,我不会期望在终止在第一个NUL返回的流与完全丢弃NUL之间存在特定的行为. 报价:

The specification allows shells to behave in an implementation-defined manner here; without knowing the specific shell and release being targeted, I would not expect a specific behavior between terminating the stream returned at the first NUL and simply discarding NULs altogether. Quoting:

外壳程序应通过在子外壳程序环境(请参见外壳程序执行环境)中执行命令并将命令替换项(命令文本加上"$()"或反引号引起来)替换为标准输出,从而扩展命令替换项.命令,在替换结束时删除一个或多个字符的序列.输出结束前的嵌入字符不得删除;但是,根据IFS的值和有效的引用,可以将它们视为字段定界符并在字段拆分期间将其消除. 如果输出包含任何空字节,则行为未指定.


这并不是说您无法在广泛使用的shell中读取和生成包含NUL的流!请参阅下面的内容,使用进程替换(为bash编写,但应与ksh或zsh一起使用,并进行较小的更改):


This isn't to say you can't read and produce streams containing NULs in widely-available shells! See the below, using process substitution (written for bash, but should work with ksh or zsh with minor changes if any):

# read content from stdin into array variable and a scalar variable "suffix"
array=( )
while IFS= read -r -d '' line; do
  array+=( "$line" )
done < <(process that generates NUL stream here)
suffix=$line # content after last NUL, if any

# emit recorded content
printf '%s\0' "${array[@]}"; printf '%s' "$suffix"

这篇关于是否已在进程替换上跳过/忽略NUL字节标准化了?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆