JS字符串中的行结尾(也称为Newlines) [英] Line endings (also known as Newlines) in JS strings

查看:132
本文介绍了JS字符串中的行结尾(也称为Newlines)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

众所周知,类Unix系统使用 LF 字符作为换行符,而Windows使用 CR + LF

It is well known, that Unix-like system uses LF characters for newlines, whereas Windows uses CR+LF.

然而,当我在Windows PC上从本地HTML文件测试此代码时,似乎JS将所有换行视为以 LF 。这是正确的假设吗?

However, when I test this code from local HTML file on my Windows PC, it seems that JS treat all newlines as separated with LF. Is it correct assumption?

var string = `
    foo




    bar
`;

// There should be only one blank line between foo and bar.

// \n - Works
// string = string.replace(/^(\s*\n){2,}/gm, '\n');

// \r\n - Doesn't work
string = string.replace(/^(\s*\r\n){2,}/gm, '\r\n');

alert(string);

// That is, it seems that JS treat all newlines as separated with 
// `LF` instead of `CR+LF`?


推荐答案

我想我找到了解释。

您使用的是ES6 模板文字用于构建多行字符串。

You are using an ES6 Template Literal to construct your multi-line string.

根据 ECMAScript specs a


..模板文字组件被解释为Unicode
代码点的序列。文字组件的模板值(TV)是按照由模板文字组件的
各个部分贡献的代码单元值(SV,11.8.4)描述的
。作为这个
过程的一部分,模板组件中的一些Unicode代码点被解释为具有数学值(MV,11.8.3)的
。在确定电视的
中,转义序列被转义序列表示的Unicode代码点的UTF-16代码
单位替换。
模板原始值(TRV)类似于模板值,其中
差异在TRVs转义序列中按字面解释。

.. template literal component is interpreted as a sequence of Unicode code points. The Template Value (TV) of a literal component is described in terms of code unit values (SV, 11.8.4) contributed by the various parts of the template literal component. As part of this process, some Unicode code points within the template component are interpreted as having a mathematical value (MV, 11.8.3). In determining a TV, escape sequences are replaced by the UTF-16 code unit(s) of the Unicode code point represented by the escape sequence. The Template Raw Value (TRV) is similar to a Template Value with the difference that in TRVs escape sequences are interpreted literally.

在此之下,定义为:


LineTerminatorSequence ::< LF>的TRV是代码单元0x000A(LINE
FEED)。

LineTerminatorSequence ::< CR>的TRV是代码单位0x000A(LINE FEED)。

The TRV of LineTerminatorSequence::<LF> is the code unit 0x000A (LINE FEED).
The TRV of LineTerminatorSequence::<CR> is the code unit 0x000A (LINE FEED).

我的解释是,你总是得到一个换行符 - 无论操作系统如何使用模板文字时的特定新行定义。

My interpretation here is, you always just get a line feed - regardless of the OS-specific new-line definitions when you use a template literal.

最后,在 JavaScript的正则表达式 a


\ n匹配换行符(U + 000A)。

\n matches a line feed (U+000A).

描述观察到的行为。

但是,如果您定义字符串文字'\ r \ n'或从包含特定于操作系统的新行的文件流等中读取文本你需要处理它。

However, if you define a string literal '\r\n' or read text from a file stream, etc that contains OS-specific new-lines you have to deal with it.

这样的混淆可能有助于 Google的JavaScript样式指南不使用模板文字

Confusions like this may have contributed to the recommendation in Google's JavaScript Style Guide not to use template literals.

以下是一些演示模板文字行为的测试:

Here are some tests that demonstrate the behavior of template literals:

`a
b`.split('')
  .map(function (char) {
    console.log(char.charCodeAt(0));
  });

(String.raw`a
b`).split('')
  .map(function (char) {
    console.log(char.charCodeAt(0));
  });
  
 'a\r\nb'.split('')
  .map(function (char) {
    console.log(char.charCodeAt(0));
  });
  
"a\
b".split('')
  .map(function (char) {
    console.log(char.charCodeAt(0));
  });

解释结果:

char(97)= a ,char(98)= b

char(10)= \ n ,char(13)= \ r

Interpreting the results:
char(97) = a, char(98) = b
char(10) = \n, char(13) = \r

这篇关于JS字符串中的行结尾(也称为Newlines)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆