bash如何解析ANSI-C带引号的字符串中的控制字符转义码? [英] How does bash parse control character escape codes in ANSI-C quoted strings?
问题描述
我正在重新实现bash的引用了ANSI-C在JavaScript中,但是我很难理解它们中的控制字符是如何解析的.我在 lib/中看到了代码sh/strtrans.c
这样做:
I'm re-implementing bash's ANSI-C quoted strings in JavaScript but I am having trouble understanding how control characters in them are parsed. I see the code in lib/sh/strtrans.c
does this:
case 'c':
if (sawc)
{
*sawc = 1;
*r = '\0';
if (rlen)
*rlen = r - ret;
return ret;
}
else if ((flags & 1) == 0 && *s == 0)
; /* pass \c through */
else if ((flags & 1) == 0 && (c = *s))
{
s++;
if ((flags & 2) && c == '\\' && c == *s)
s++; /* Posix requires $'\c\\' do backslash escaping */
c = TOCTRL(c);
break;
}
and TOCTRL
is defined in include/chartypes.h
as
# define TOCTRL(x) ((x) == '?' ? 0x7f : (TOUPPER(x) & 0x1f))
其中 TOUPPER
实际上是C的 toupper
函数.
where TOUPPER
is effectively C's toupper
function.
所以我期望它是在字符" \ c
"之后的第一个字节处取整,如果是字母则将其大写,并且结果的前三位为零
So what I would expect is it takes at the first byte of the character after "\c
", uppercases it if it's a letter, and zero's out the first three bits of the result.
用NodeJS脚本进行详尽的测试,发现此规则在两种情况下不起作用:
Exhaustively testing this with a NodeJS script, I found that this rule doesn't work for two cases:
$ bash -c $'echo -n "\x01" | xxd -b'
00000000: 00000001 .
$ bash -c $'echo -n $\'\\c\x01\' | xxd -b'
00000000: 00000001 00000001 ..
$ bash -c $'echo -n "\x7F" | xxd -b'
00000000: 01111111 .
$ bash -c $'echo -n $\'\\c\x7F\' | xxd -b'
00000000: 00000001 01111111 ..
(很抱歉,我正在使用ANSI-C引号引起来的字符串来生成bash命令,并在其中使用另一个ANSI-C引号引起来,以便可以在 \ c
之后插入任意字符)
(apologies if that's confusing, I am using an ANSI-C quoted string to generate a bash command with another ANSI-C quoted string inside so that I can insert arbitrary characters after the \c
)
,如果将前3位清零,则会产生 00000000
字符(例如 \ c
( 00100000
)或 \ c @
( 01000000
)),即NULL,它将终止字符串并导致 xxd
不打印任何内容,但这并不奇怪.
and if zeroing out the first 3 bits produces a 00000000
character (e.g. \c
(00100000
) or \c@
(01000000
)), that's the NULL, which terminates the string and causes xxd
to not print anything, but that's not too surprising.
我想知道为什么会这样.
I'm wondering why that happens.
推荐答案
我们还需要并在下面的 strtrans.c 有:
case 'c':
if (sawc)
{
*sawc = 1;
*r = '\0';
if (rlen)
*rlen = r - ret;
return ret;
}
else if ((flags & 1) == 0 && *s == 0)
; /* pass \c through */
else if ((flags & 1) == 0 && (c = *s))
{
s++;
if ((flags & 2) && c == '\\' && c == *s)
s++; /* Posix requires $'\c\\' do backslash escaping */
c = TOCTRL(c);
break;
}
/*FALLTHROUGH*/
default:
if ((flags & 4) == 0)
*r++ = '\\';
break;
}
# c is 0x01 or 0x1f
if ((flags & 2) && (c == CTLESC || c == CTLNUL))
*r++ = CTLESC; # adds 0x01
*r++ = c; # adds 0x01 or 0x1f
}
我不知道 \ c
转义序列来自何处.它是不在C中,并且tbh我没看到它据我所知使用.它从何而来?我想说的是,使用 \ c $'\ x01'
和 \ c $'\ x1f'
将被视为未定义行为",但是我不知道哪个无论如何,实际上都允许使用字符.
I do not know where the \c
escape sequence does come from. It's not in C and tbh I did not see it used, as far as I can tell. Where does it come from? I wanted to say that using \c$'\x01'
and \c$'\x1f'
would count as "undefined behavior", but I have no idea which characters are actually allowed anyway.
和 xxd -b
:p
这篇关于bash如何解析ANSI-C带引号的字符串中的控制字符转义码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!