与 RTL 语言一起使用时的字符串替换函数调用顺序 [英] Order of string replacement function invocations when used with RTL languages

查看:47
本文介绍了与 RTL 语言一起使用时的字符串替换函数调用顺序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当调用 String.replace 使用替换函数,我们可以检索匹配子字符串的偏移量.

When calling String.replace with a replacement function we're able to retrieve offsets of the matched substrings.

var a = [];
"hello world".replace(/l/g, function (m, i) { a.push(i); });
// a = [2, 3, 9]

在上面的示例中,我们获得了匹配 l 字符的偏移量列表.

In the example above, we're getting a list of offsets for the matching l characters.

我能否指望实现总是按出现的升序调用匹配函数,即使在使用从右到左编写的语言时也是如此?

Can I count on implementations to always invoke the match function in ascending order of occurrence, even when used with languages that are written from right to left?

也就是说:我能确定上面的结果总是 [2,3,9] 而不是 [3,9,2] 或任何其他这些偏移量的排列?

That is: Can I be sure that the result above will always be [2,3,9] and not [3,9,2] or any other permutation of those offsets?

这是对这个问题的跟进,Tomalak 回答:

This is a follow-up on this question that Tomalak answered with:

当然,是的.匹配在源字符串中从左到右处理,因为从左到右是正则表达式引擎处理字符串的方式.

Absolutely, yes. Matches are handled from left to right in the source string because left-to-right is how regular expression engines work their way to a string.

然而,关于 RTL 语言的情况,他也说:

However, regarding the case with RTL languages he also said:

这是个好问题[...] RTL 文本肯定会改变 JavaScript 正则表达式的行为方式.

That's a good question [...] RTL text definitely changes how JavaScript regular expressions behave.

我已经在 Chrome 中使用以下 RTL 代码段进行了测试:

I've tested with the following RTL snippet in Chrome:

var a = [];
"بلوچی مکرانی".replace(/ی/g, function (m, i) { a.push(i); });
// a = [4, 11]

我不会说那种语言,但在查看字符串时,我看到 ی 字符是字符串的第一个字符,也是空格后的第一个字符.但是,由于文本是从右到左书写的,这些位置实际上是 最后一个字符 之前的空白和 字符串中的最后一个字符 - 转换为 [4,11]

I don't speak that language but looking at the string I see the ی character as the first character of the string and as the first character after the white space. However, since the text is written right-to-left those positions are actually the last character before the white space and the last character in the string - which translates into [4,11]

因此,这似乎在 Chrome 中按预期工作.问题是:我可以相信结果在所有兼容的 javascript 实现上都是一样的吗?

So, this seems to work just as expected in Chrome. The question is: Can I trust that the result will be the same on all compliant javascript implementations?

推荐答案

我在 ECMA-262 5.1 Edition/June 2011 中搜索了关键字格式控制"、从右到左"和RTL",有没有提到它们,除了它说字符串文字和正则表达式文字中允许使用格式控制字符.

I have searched the ECMA-262 5.1 Edition/June 2011 with the keyword "format control", "right to left" and "RTL", and there is no mention of them, except for where it says format control characters are allowed in the string literal and regular expression literal.

来自第 7.1 节

在源文本中允许格式控制字符以方便编辑和显示是很有用的.所有格式控制字符都可以在注释、字符串文字和正则表达式文字中使用.

It is useful to allow format-control characters in source text to facilitate editing and display. All format control characters may be used within comments, and within string literals and regular expression literals.

附件 E

7.1:Unicode 格式控制字符在处理之前不再从 ECMAScript 源文本中剥离.在第 5 版中,如果这样的字符出现在 StringLiteralRegularExpressionLiteral 中,该字符将被合并到文字中,而在第 3 版中,该字符不会被合并到文字中

7.1: Unicode format control characters are no longer stripped from ECMAScript source text before processing. In Edition 5, if such a character appears in a StringLiteral or RegularExpressionLiteral the character will be incorporated into the literal where in Edition 3 the character would not be incorporated into the literal

由此,我得出结论,JavaScript 对从右到左的字符的操作没有任何不同.它只知道存储在字符串中的 UTF-16 代码单元,并基于 逻辑顺序.

With this, I conclude that JavaScript doesn't operate any differently on Right-to-Left characters. It only knows about the UTF-16 code units stored in the string, and works based on the logical order.

这篇关于与 RTL 语言一起使用时的字符串替换函数调用顺序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆