按空格拆分字符串(包含标签)而不破坏标签或在Javascript中标记内部html [英] Split a string (that contains tags) by spaces without breaking the tags or tag inner html in Javascript
问题描述
我试图用空格将字符串拆分成一个单词数组。如果字符串包含HTML标记,我希望将完整标记(包括内容)视为单个单词。
I'm attempting to split a string by spaces into an array of words. If the string contains HTML tags, I would like the full tag (including content) to be treated as a single word.
例如,
I like to eat <a href="http://www.waffles.com/">tasty delicious waffles</a> for breakfast
应分成
I
like
to
eat
<a href="http://www.waffles.com/">tasty delicious waffles</a>
for
breakfast
我在Stack Overflow上看过几个相关的主题但我无法适应Javascript,因为它们是针对我不太熟悉的语言编写的。是否有正则表达式可以轻松地执行此操作或解决方案是否需要多个正则表达式拆分和迭代?
I've seen a couple related threads on Stack Overflow but I'm having trouble adapting anything to Javascript because they were written for languages that I'm not quite familiar with. Is there a regex expression that could easily do this or will the solution require multiple regex splits and iteration?
谢谢。
推荐答案
result = subject.match(/<\s*(\w+\b)(?:(?!<\s*\/\s*\1\b)[\s\S])*<\s*\/\s*\1\s*>|\S+/g);
如果你的标签不能嵌套,所有标签都正确关闭,并且
will work if your tags can't be nested, if all tags are properly closed, and if current tag names don't occur in comments, strings etc.
说明:
<\s* # Either match a < (+ optional whitespace)
(\w+\b) # tag name
(?: # Then match...
(?! # (as long as it's impossible to match...
<\s*\/\s*\1\b # the closing tag here
) # End of negative lookahead)
[\s\S] # ...any character
)* # zero or more times.
<\s*\/\s*\1\s*> # Then match the closing tag.
| # OR:
\S+ # Match a run of non-whitespace characters.
这篇关于按空格拆分字符串(包含标签)而不破坏标签或在Javascript中标记内部html的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!