Javascript和RegEx:拆分并保持分隔符 [英] Javascript and RegEx: Split and keep delimiter

查看:87
本文介绍了Javascript和RegEx:拆分并保持分隔符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个正则表达式,它将我的字符串拆分为数组。

I have a regex which will split my string into arrays.

除了我想保留分隔符的一部分外,Everyything工作正常。

Everyything works fine except that I would like to keep a part of the delimiter.

这是我的正则表达式:

(&#?[a-zA-Z0-9]+;)[\s]

我正在做:

var test = paragraph.split(/(&#?[a-zA-Z0-9]+;)[\s]/g);

我的段落如下:

Current addresses:  &dagger;    Biopharmaceutical Research and Development<br />
&Dagger;    Clovis Oncology<br />
&sect;  Pisces Molecular <br />
||  School of Biological Sciences    
&para;  Department of Chemistry<br />

问题是我的数组中有10个元素而不是5个元素。事实上,我也将分隔符作为一个元素,我的目标是保持分隔符与分裂元素,而不是创建一个新分隔符。

The problem is that I am getting 10 elements in my array and not 5 as I should. In fact, I am also getting my delimiter as an element and my goal is to keep the delimiter with the splited element and not to create a new one.

非常感谢你非常适合你的帮助。

Thank you very much for your help.

编辑:

我希望得到这样的结果:

I would like to get this as a result:

1. &dagger; Biopharmaceutical Research and Development<br />
2. &Dagger; Clovis Oncology<br />
3. &sect;   &sect;  Pisces Molecular <br />
||  School of Biological Sciences  
4.  &para;  Department of Chemistry<br />


推荐答案

尝试使用匹配而不是:

var test = paragraph.match(/&#?[a-zA-Z0-9]+;\s[^&]*/g);

更新:添加了必需的空格 \s 匹配。

Updated: Added a required white-space \s match.

说明:


  • &#?匹配& 和可选的(问题标记匹配前一次或零次)

  • &#? Match & and an optional # (the question mark match previous one or zero times)

[a-zA-Z0-9] 是一个所有大写和小写字符和数字的范围。如果您也接受下划线,则可以用 \w 替换它。

[a-zA-Z0-9] is a range of all upper and lower case characters and digits. If you also accept an underscore you could replace this with \w.

+ sign表示它应该匹配最后一个模式一次或多次,因此它匹配一个或多个字符az,AZ和数字0-9。

The + sign means that it should match the last pattern one or more times, so it matches one or more characters a-z, A-Z and digits 0-9.

; 匹配字符;

\s 匹配类空格。这包括空格,制表符和其他空白字符。

The \s matches the class white-space. That includes space, tab and other white-space characters.

[^&] * 一次再一个范围,但由于 ^ 是第一个否定匹配的字符,所以不是匹配& - 它匹配除了& 之外的所有字符。星形匹配模式零次或多次。

[^&]* Once again a range, but since ^ is the first character the match is negated, so instead of matching the &-characters it matches everything but the &. The star matches the pattern zero or more times.

g 结尾,在最后一个 / 表示全局,并在第一次匹配后继续匹配并获得所有匹配的数组。

g at the end, after the last / means global, and makes the match continue after the first match and get an array of all matches.

所以,匹配& 和一个可选的,后跟任意数量的字母或数字(但至少有一个),然后是; ,然后是空格,后跟零个或多个不是& 的字符。

So, match & and an optional #, followed by any number of letters or digits (but at least one), followed by ;, followed by a white-space, followed by zero or more characters that isn't &.

这篇关于Javascript和RegEx:拆分并保持分隔符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆