Javascript和RegEx:拆分并保持分隔符 [英] Javascript and RegEx: Split and keep delimiter
问题描述
我有一个正则表达式,它将我的字符串拆分为数组。
I have a regex which will split my string into arrays.
除了我想保留分隔符的一部分外,Everyything工作正常。
Everyything works fine except that I would like to keep a part of the delimiter.
这是我的正则表达式:
(&#?[a-zA-Z0-9]+;)[\s]
我正在做:
var test = paragraph.split(/(&#?[a-zA-Z0-9]+;)[\s]/g);
我的段落如下:
Current addresses: † Biopharmaceutical Research and Development<br />
‡ Clovis Oncology<br />
§ Pisces Molecular <br />
|| School of Biological Sciences
¶ Department of Chemistry<br />
问题是我的数组中有10个元素而不是5个元素。事实上,我也将分隔符作为一个元素,我的目标是保持分隔符与分裂元素,而不是创建一个新分隔符。
The problem is that I am getting 10 elements in my array and not 5 as I should. In fact, I am also getting my delimiter as an element and my goal is to keep the delimiter with the splited element and not to create a new one.
非常感谢你非常适合你的帮助。
Thank you very much for your help.
编辑:
我希望得到这样的结果:
I would like to get this as a result:
1. † Biopharmaceutical Research and Development<br />
2. ‡ Clovis Oncology<br />
3. § § Pisces Molecular <br />
|| School of Biological Sciences
4. ¶ Department of Chemistry<br />
推荐答案
尝试使用匹配
而不是:
var test = paragraph.match(/&#?[a-zA-Z0-9]+;\s[^&]*/g);
更新:添加了必需的空格 \s
匹配。
Updated: Added a required white-space \s
match.
说明:
-
&#?
匹配&
和可选的#
(问题标记匹配前一次或零次)
&#?
Match&
and an optional#
(the question mark match previous one or zero times)
[a-zA-Z0-9]
是一个所有大写和小写字符和数字的范围。如果您也接受下划线,则可以用 \w
替换它。
[a-zA-Z0-9]
is a range of all upper and lower case characters and digits. If you also accept an underscore you could replace this with \w
.
+
sign表示它应该匹配最后一个模式一次或多次,因此它匹配一个或多个字符az,AZ和数字0-9。
The +
sign means that it should match the last pattern one or more times, so it matches one or more characters a-z, A-Z and digits 0-9.
;
匹配字符;
。
\s
匹配类空格。这包括空格,制表符和其他空白字符。
The \s
matches the class white-space. That includes space, tab and other white-space characters.
[^&] *
一次再一个范围,但由于 ^
是第一个否定匹配的字符,所以不是匹配&
- 它匹配除了&
之外的所有字符。星形匹配模式零次或多次。
[^&]*
Once again a range, but since ^
is the first character the match is negated, so instead of matching the &
-characters it matches everything but the &
. The star matches the pattern zero or more times.
g
结尾,在最后一个 / $ c之后$ c>表示
全局
,并在第一次匹配后继续匹配
并获得所有匹配的数组。
g
at the end, after the last /
means global
, and makes the match
continue after the first match and get an array of all matches.
所以,匹配&
和一个可选的#
,后跟任意数量的字母或数字(但至少有一个),然后是;
,然后是空格,后跟零个或多个不是&
的字符。
So, match &
and an optional #
, followed by any number of letters or digits (but at least one), followed by ;
, followed by a white-space, followed by zero or more characters that isn't &
.
这篇关于Javascript和RegEx:拆分并保持分隔符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!