正则表达式仅搜索/替换文本,而不是HTML属性 [英] Regexp to search/replace only text, not in HTML attribute
问题描述
我使用JavaScript来执行一些正则表达式。考虑到我正在处理格式良好的源代码,并且我希望在[,。]之前移除任何空格,并且在[,。]之后只保留一个空格,但[,。]是数字的一部分。因此我使用:
pre $ text $ text = text.replace(/ *(,| \。)*([^ 0-9 ])/ g,'$ 1 $ 2');
问题在于,它也会替换html标记属性中的文本。例如我的文本是(总是用一个标签包装):
< p>测试和测试。再次< img src =xyz.jpg> ...< / p为H.
现在它增加了一个这样的空间 src =xyz。jpg
这不是预期的。我怎样才能重写我的正则表达式?我想要的是
< p>测试和测试。再次< img src =xyz.jpg> ...< / p为H.
谢谢!
text = text.replace(/(?![^<>)>)*([。,])*([^ \d])/ g,'$ 1 $ 2');
通常的警告适用于属性值中的CDATA部分,SGML注释,SCRIPT元素和尖括号。但是我怀疑你的真正问题会出现在简单文本的变幻莫测之中; HTML甚至不在同一联盟中。 :D
I'm using JavaScript to do some regular expression. Considering I'm working with well-formed source, and I want to remove any space before[,.] and keep only one space after [,.], except that [,.] is part of a number. Thus I use:
text = text.replace(/ *(,|\.) *([^ 0-9])/g, '$1 $2');
The problem is that this replaces also text in the html tag attributes. For example my text is (always wrapped with a tag):
<p>Test,and test . Again <img src="xyz.jpg"> ...</p>
Now it adds a space like this src="xyz. jpg"
that is not expected. How can I rewrite my regular expression? What I want is
<p>Test, and test. Again <img src="xyz.jpg"> ...</p>
Thanks!
You can use a lookahead to make sure the match isn't occurring inside a tag:
text = text.replace(/(?![^<>]*>) *([.,]) *([^ \d])/g, '$1 $2');
The usual warnings apply regarding CDATA sections, SGML comments, SCRIPT elements, and angle brackets in attribute values. But I suspect your real problems will arise from the vagaries of "plain" text; HTML's not even in the same league. :D
这篇关于正则表达式仅搜索/替换文本,而不是HTML属性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!