JavaScript中中文逗号匹配和拆分的正则表达式 [英] Regular expression to match and split on chinese comma in JavaScript

查看:535
本文介绍了JavaScript中中文逗号匹配和拆分的正则表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个正则表达式 / \s *,\s * / ,它先匹配左空格,再匹配逗号,再匹配右空格。



示例:

  var str =约翰,沃克詹姆斯,保罗; 
var arr = str.split(/ \s *,\s * /);
arr中的值= [john,walker james,paul] //大小:3

带有汉字的示例:

  var str =继续,取消继续,取消; 
var arr = str.split(/ \s *,\s * /);
arr中的值= [继续,取消继续,取消] //大小:1,索引0处的所有值均未发生分裂

尝试用unicode分割字符:

  var str = john,walker james,保罗; 
var arr = str.split(/ \u0020 * \u002C\u0020 * /);
arr中的值= [约翰,沃克詹姆斯,保罗] //大小:3

var str =继续,取消继续,取消;
var arr = str.split(/ \u0020 * \u002C\u0020 * /);
arr中的值= [继续,取消继续,取消] //大小:1,索引0处的所有值均未发生分裂

我通过了


I have a regular expression /\s*,\s*/ that matches left spaces followed by comma then right spaces.

Example:

var str = "john,walker    james  , paul";
var arr = str.split(/\s*,\s*/);
Values in arr = [john,walker james,paul] // Size: 3

Example with Chinese characters:

var str = "继续,取消   继续 ,取消";
var arr = str.split(/\s*,\s*/);
Values in arr = ["继续,取消   继续 ,取消"] // Size: 1, All values at index 0 no splitting happened

Tried splitting characters with unicodes:

var str = "john,walker    james  , paul";
var arr = str.split(/\u0020*\u002C\u0020*/);
Values in arr = [john,walker james,paul] // Size: 3

var str = "继续,取消   继续 ,取消";
var arr= str.split(/\u0020*\u002C\u0020*/);
Values in arr = ["继续,取消   继续 ,取消"] // Size: 1, All values at index 0 no splitting happened

I went through this link but not much info was there that I can use in my scenario. Is it really impossible to create regex for Chinese characters and split them?

解决方案

An ASCII comma won't match the comma you have in Chinese text. Either replace the ASCII comma (\x2C) with the Chinese one (\uFF0C), or use a character class [,,] to match both:

var str = "继续,取消   继续 ,取消";
console.log(str.split(/\s*[,,]\s*/));

Here is a regex that will match all the commas mentioned on the Comma Wikipedia page:

/\s*(?:\uD805\uDC4D|\uD836\uDE87|[\u002C\u02BB\u060C\u2E32\u2E34\u2E41\u2E49\u3001\uFE10\uFE11\uFE50\uFE51\uFF0C\uFF64\u00B7\u055D\u07F8\u1363\u1802\u1808\uA4FE\uA60D\uA6F5\u02BD\u0312\u0313\u0314\u0315\u0326\u201A])\s*/

Note that U+1144D (NEWA COMMA) and U+1DA87 (SIGNWRITING COMMA) have to be transpiled as \uD805\uDC4D and \uD836\uDE87 in order to be compatible with the ES5 regex standard.

The following commas are handled:

这篇关于JavaScript中中文逗号匹配和拆分的正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆