bash如何解析ANSI-C带引号的字符串中的控制字符转义码? [英] How does bash parse control character escape codes in ANSI-C quoted strings?

查看:83
本文介绍了bash如何解析ANSI-C带引号的字符串中的控制字符转义码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在重新实现bash的引用了ANSI-C在JavaScript中,但是我很难理解它们中的控制字符是如何解析的.我在 lib/中看到了代码sh/strtrans.c 这样做:

I'm re-implementing bash's ANSI-C quoted strings in JavaScript but I am having trouble understanding how control characters in them are parsed. I see the code in lib/sh/strtrans.c does this:

            case 'c':
              if (sawc)
                {
                  *sawc = 1;
                  *r = '\0';
                  if (rlen)
                    *rlen = r - ret;
                  return ret;
                }
              else if ((flags & 1) == 0 && *s == 0)
                ;               /* pass \c through */
              else if ((flags & 1) == 0 && (c = *s))
                {
                  s++;
                  if ((flags & 2) && c == '\\' && c == *s)
                    s++;        /* Posix requires $'\c\\' do backslash escaping */
                  c = TOCTRL(c);
                  break;
                }

TOCTRL

and TOCTRL is defined in include/chartypes.h as

#  define TOCTRL(x) ((x) == '?' ? 0x7f : (TOUPPER(x) & 0x1f))

其中 TOUPPER 实际上是C的 toupper 函数.

where TOUPPER is effectively C's toupper function.

所以我期望它是在字符" \ c "之后的第一个字节处取整,如果是字母则将其大写,并且结果的前三位为零

So what I would expect is it takes at the first byte of the character after "\c", uppercases it if it's a letter, and zero's out the first three bits of the result.

用NodeJS脚本进行详尽的测试,发现此规则在两种情况下不起作用:

Exhaustively testing this with a NodeJS script, I found that this rule doesn't work for two cases:

$ bash -c $'echo -n "\x01" | xxd -b'
00000000: 00000001                                               .
$ bash -c $'echo -n $\'\\c\x01\' | xxd -b'
00000000: 00000001 00000001                                      ..

$ bash -c $'echo -n "\x7F" | xxd -b'
00000000: 01111111                                               .
$ bash -c $'echo -n $\'\\c\x7F\' | xxd -b'
00000000: 00000001 01111111                                      ..

(很抱歉,我正在使用ANSI-C引号引起来的字符串来生成bash命令,并在其中使用另一个ANSI-C引号引起来,以便可以在 \ c 之后插入任意字符)

(apologies if that's confusing, I am using an ANSI-C quoted string to generate a bash command with another ANSI-C quoted string inside so that I can insert arbitrary characters after the \c)

,如果将前3位清零,则会产生 00000000 字符(例如 \ c ( 00100000 )或 \ c @( 01000000 )),即NULL,它将终止字符串并导致 xxd 不打印任何内容,但这并不奇怪.

and if zeroing out the first 3 bits produces a 00000000 character (e.g. \c (00100000) or \c@ (01000000)), that's the NULL, which terminates the string and causes xxd to not print anything, but that's not too surprising.

我想知道为什么会这样.

I'm wondering why that happens.

推荐答案

我们还需要并在下面的 strtrans.c 有:

    case 'c':
      if (sawc)
    {
      *sawc = 1;
      *r = '\0';
      if (rlen)
        *rlen = r - ret;
      return ret;
    }
      else if ((flags & 1) == 0 && *s == 0)
    ;       /* pass \c through */
      else if ((flags & 1) == 0 && (c = *s))
    {
      s++;
      if ((flags & 2) && c == '\\' && c == *s)
        s++;    /* Posix requires $'\c\\' do backslash escaping */
      c = TOCTRL(c);
      break;
    }
    /*FALLTHROUGH*/
    default:
    if ((flags & 4) == 0)
      *r++ = '\\';
    break;
    }
  # c is 0x01 or 0x1f
  if ((flags & 2) && (c == CTLESC || c == CTLNUL))
    *r++ = CTLESC;     # adds 0x01
  *r++ = c;            # adds 0x01 or 0x1f
}

我不知道 \ c 转义序列来自何处.它是不在C中,并且tbh我没看到它据我所知使用.它从何而来?我想说的是,使用 \ c $'\ x01' \ c $'\ x1f'将被视为未定义行为",但是我不知道哪个无论如何,实际上都允许使用字符.

I do not know where the \c escape sequence does come from. It's not in C and tbh I did not see it used, as far as I can tell. Where does it come from? I wanted to say that using \c$'\x01' and \c$'\x1f' would count as "undefined behavior", but I have no idea which characters are actually allowed anyway.

xxd -b :p

这篇关于bash如何解析ANSI-C带引号的字符串中的控制字符转义码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆