C ++:多行字符串常量中的行尾是否有标准定义? [英] C++: Is there a standard definition for end-of-line in a multi-line string constant?

查看:112
本文介绍了C ++:多行字符串常量中的行尾是否有标准定义?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我有多行字符串C ++ 11字符串常量,例如

If I have a multi-line string C++11 string constant such as

R"""line 1
line 2
line3"""

是否定义了什么字符行终止符/分隔符由什么组成?

Is it defined what character(s) the line terminator/separator consist of?

推荐答案

目的是将原始字符串文字中的换行符映射到单个$ b。 $ b '\n'字符。这种意图并未像
那样清晰地表达出来,这引起了一些混乱。

The intent is that a newline in a raw string literal maps to a single '\n' character. This intent is not expressed as clearly as it should be, which has led to some confusion.

引文均符合2011 ISO C ++标准。

Citations are to the 2011 ISO C++ standard.

首先,这是它映射到单个'\n'字符的证据。

First, here's the evidence that it maps to a single '\n' character.

第2.14.5节[lex.string]第4段中的注释说:

A note in section 2.14.5 [lex.string] paragraph 4 says:


[注意:原始字符串文字中的源文件换行会在执行结果 string-literal 中产生
换行。在下面的示例中,假设在行首没有
空格,则
断言将成功:

[ Note: A source-file new-line in a raw string literal results in a new-line in the resulting execution string-literal. Assuming no whitespace at the beginning of lines in the following example, the assert will succeed:



    const char *p = R"(a\
    b
    c)";
    assert(std::strcmp(p, "a\\\nb\nc") == 0);




尾注]

这显然表明换行符映射到单个'\n'
字符。它还与观察到的g ++ 6.2.0和
clang ++ 3.8.1的行为(在Linux系统上使用带有
Unix样式和Windows样式行尾的源文件进行的测试)。

This clearly states that a newline is mapped to a single '\n' character. It also matches the observed behavior of g++ 6.2.0 and clang++ 3.8.1 (tests done on a Linux system using source files with Unix-style and Windows-style line endings).

鉴于注释中明确指出的意图以及两个
流行编译器的行为,我想依靠它是安全的-尽管

Given the clearly stated intent in the note and the behavior of two popular compilers, I'd say it's safe to rely on this -- though it would be interesting to see how other compilers actually handle this.

但是,对
标准的 normative 措词进行字面阅读可能会很有趣。容易得出不同的结论,或至少
会带来一些不确定性。

However, a literal reading of the normative wording of the standard could easily lead to a different conclusion, or at least to some uncertainty.

第2.5节[lex.pptoken]第3段说(强调):

Section 2.5 [lex.pptoken] paragraph 3 says (emphasis added):



原始字符串的初始和最终双引号字符之间,在阶段1中执行的 any 转换和2个
(字母,通用字符名称和行拼接)
被还原;此版本应在标识任何 d-char
r-char 或定界括号之前应用。

Between the initial and final double quote characters of the raw string, any transformations performed in phases 1 and 2 (trigraphs, universal-character-names, and line splicing) are reverted; this reversion shall apply before any d-char, r-char, or delimiting parenthesis is identified.

翻译的阶段在2.2 [lex.phases]中指定。在阶段1:

The phases of translation are specified in 2.2 [lex.phases]. In phase 1:


物理源文件字符以
实现定义的方式映射到基本源字符集
(在行尾指示器中引入换行符),如果需要

Physical source file characters are mapped, in an implementation-defined manner, to the basic source character set (introducing new-line characters for end-of-line indicators) if necessary.

假设物理源文件字符到
基本字符集的映射以及换行符的引入是
transformations ,我们可以合理地得出以下结论: ,Windows格式
源文件中原始字符串文字中间的
a换行符应等效于 \r\n 序列。 (我可以想象
对于Windows特定的代码很有用。)

If we assume that the mapping of physical source file characters to the basic character set and the introduction of new-line characters are "tranformations", we might reasonably conclude that, for example, a newline in the middle of a raw string literal in a Windows-format source file should be equivalent to a \r\n sequence. (I can imagine that being useful for Windows-specific code.)

(这种解释的确会导致
结尾的系统出现问题,行指示符不是字符序列,例如
,其中每行都是固定宽度的记录。这种系统现在很少使用
。)

(This interpretation does lead to problems with systems where the end-of-line indicator is not a sequence of characters, for example where each line is a fixed-width record. Such systems are rare these days.)

欢呼声和-Alf的答案
指出,有一个打开
缺陷报告
此问题。它是在2013年提交的,尚未得到
的解决。

As "Cheers and hth. - Alf"'s answer points out, there is an open Defect Report for this issue. It was submitted in 2013 and has not yet been resolved.

我个人认为,造成混淆的根源是 any
(强调如前所述):

Personally, I think the root of the confusion is the word "any" (emphasis added as before):


在原始
字符串的初始和最终双引号字符之间,任何在第1阶段和第2阶段(字母,
通用字符名称和行拼接)
中执行的转换被还原;在识别出任何 d-char r-char 或定界
括号之前,应应用此
还原。

Between the initial and final double quote characters of the raw string, any transformations performed in phases 1 and 2 (trigraphs, universal-character-names, and line splicing) are reverted; this reversion shall apply before any d-char, r-char, or delimiting parenthesis is identified.

当然,可以将物理源文件字符映射到
基本源字符集,这可以合理地认为
转换。括在括号中的子句((字母,
通用字符名称和行拼接))似乎旨在让
指定要还原的 转换,但是
试图更改 transformations
(标准未正式定义)一词的含义,或者与 any一词的使用
矛盾。

Surely the mapping of physical source file characters to the basic source character set can reasonably be thought of as a transformation. The parenthesized clause "(trigraphs, universal-character-names, and line splicing)" seems to be intended to specify which transformations are to be reverted, but that either attempts to change the meaning of the word "transformations" (which the standard does not formally define) or contradicts the use of the word "any".

我建议将单词 any更改为 certain会更清楚地表达
的意图:

I suggest that changing the word "any" to "certain" would express the apparent intent much more clearly:


在原始
字符串的起始和最终双引号字符之间,将还原在阶段1和阶段2中执行的某些转换(三字组,
通用字符名称和行拼接);在识别出任何 d-char r-char 或定界
括号之前,应应用此
还原。

Between the initial and final double quote characters of the raw string, certain transformations performed in phases 1 and 2 (trigraphs, universal-character-names, and line splicing) are reverted; this reversion shall apply before any d-char, r-char, or delimiting parenthesis is identified.

此措辞将使 trigraph,
通用字符名称和行拼接更加清晰,这是唯一的
转换。要还原。 (并不是在翻译阶段1和2中完成的所有操作
都会还原,只是列出了特定的
转换。)

This wording would make it much clearer that "trigraphs, universal-character-names, and line splicing" are the only transformations that are to be reverted. (Not everything done in translation phases 1 and 2 is reverted, just those specific listed transformations.)

这篇关于C ++:多行字符串常量中的行尾是否有标准定义?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆