如何在 JavaScript 中使用支持 Unicode 的正则表达式? [英] How can I use Unicode-aware regular expressions in JavaScript?

查看:17
本文介绍了如何在 JavaScript 中使用支持 Unicode 的正则表达式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

应该有一些类似于 w 的东西可以匹配字母或标记类别中的任何代码点(不仅仅是 ASCII 的),并且希望有像 [[P*]] 这样的过滤器标点符号等

There should be something akin to w that can match any code-point in Letters or Marks category (not just the ASCII ones), and hopefully have filters like [[P*]] for punctuation, etc.

推荐答案

ES 6 的情况

ECMAScript 语言规范第 6 版(通常也称为 ES2015)包括可识别 Unicode 的正则表达式.必须使用正则表达式上的 u 修饰符启用支持.请参阅 ES6 中的 Unicode 感知正则表达式 休息一下- 功能下降和一些注意事项.

Situation for ES 6

The ECMAScript language specification, edition 6 (also commonly known as ES2015), includes Unicode-aware regular expressions. Support must be enabled with the u modifier on the regex. See Unicode-aware regular expressions in ES6 for a break-down of the feature and some caveats.

ES6 在浏览器和独立的 Javascript 运行时(例如 Node.js)中被广泛采用,因此在大多数情况下使用此功能不需要额外的努力.完整兼容性列表:https://kangax.github.io/compat-table/es6/

ES6 is widely adopted in both browsers and stand-alone Javascript runtimes such as Node.js, so using this feature won't require extra effort in most cases. Full compatibility list: https://kangax.github.io/compat-table/es6/

一个名为 regexpu 的转译器可以将 ES6 Unicode 正则表达式翻译成等效的ES5.它可以用作构建过程的一部分.在线试用..

即使 JavaScript 对 Unicode 字符串进行操作,它也没有实现可识别 Unicode 的字符类,也没有 POSIX 字符类或 Unicode 块/子范围的概念.

Even though JavaScript operates on Unicode strings, it does not implement Unicode-aware character classes and has no concept of POSIX character classes or Unicode blocks/sub-ranges.

在此处检查您的期望:Javascript RegExp Unicode 字符类测试器(原始页面已关闭,互联网档案馆仍有一份副本.)

Check your expectations here: Javascript RegExp Unicode Character Class tester ( the original page is down, the Internet Archive still has a copy.)

Flagrant Badassery 有一篇关于 JavaScript、Regex 和 Unicode 的文章 说明了这个问题.

Flagrant Badassery has an article on JavaScript, Regex, and Unicode that sheds some light on the matter.

还可以在 SO 上阅读 Regex 和 Unicode.可能您必须构建自己的标点符号类".

Also read Regex and Unicode here on SO. Probably you have to build your own "punctuation character class".

查看 正则表达式:匹配 Unicode 块范围 构建器,它可以让你构建一个 JavaScript 正则表达式来匹配任意数量的指定 Unicode 块中的字符.

Check out the Regular Expression: Match Unicode Block Range builder, which lets you build a JavaScript regular expression that matches characters that fall in any number of specified Unicode blocks.

我只是为一般标点符号"做的和补充标点符号"子范围,结果和我预期的一样简单直接:

I just did it for the "General Punctuation" and "Supplemental Punctuation" sub-ranges, and the result is as simple and straight-forward as I would have expected it:

 [u2000-u206Fu2E00-u2E7F]

  • 还有 XRegExp,一个带来 Unicode 支持 JavaScript,提供具有扩展功能的替代正则表达式引擎.

  • There also is XRegExp, a project that brings Unicode support to JavaScript by offering an alternative regex engine with extended capabilities.

    当然,需要阅读:mathiasbynens.be - JavaScript 有一个 Unicode 问题:

    这篇关于如何在 JavaScript 中使用支持 Unicode 的正则表达式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆