正则表达式删除 <p> 之间的回车符标签 [英] RegEx to remove carriage returns between <p> tags
问题描述
我一直在努力弄清楚如何删除出现在 标签之间的回车.(从技术上讲,我需要用空格替换它们,而不是删除它们.)
I've stumped myself trying to figure out how to remove carriage returns that occur between <p>
tags. (Technically I need to replace them with spaces, not remove them.)
这是一个例子.我使用美元符号 $
作为回车标记.
Here's an example. I've used a dollar sign $
as a carriage return marker.
<p>
Ac nec suspendisse est, dapibus.
</strong>
Nulla taciti curabitur enim hendrerit.$
Ante ornare phasellustellus vivamus dictumst dolor aliquam imperdiet lectus.$
Nisl nullam sodales,tincidunt dictum dui eget,gravida anno.Montes convallis$
adipiscing,aenean hac litora.Ridiculus,ut consequat curae,amet.Nostra$
菜豆类 interdum justo.Pharetra urna est hac
laoreet, magna.
$
Porttitor purus purus,quis rutrum turpis.Montes netus nibh ornare potenti quam$
班级.Natoque nec proin sapien augue curae, elementum.</p>
<p>
Ac nec <strong>
suspendisse est, dapibus.</strong>
Nulla taciti curabitur enim hendrerit.$
Ante ornare phasellus tellus vivamus dictumst dolor aliquam imperdiet lectus.$
Nisl nullam sodales, tincidunt dictum dui eget, gravida anno. Montes convallis$
adipiscing, aenean hac litora. Ridiculus, ut consequat curae, amet. Nostra$
phasellus ridiculus class interdum justo. <em>
Pharetra urna est hac</em>
laoreet, magna.$
Porttitor purus purus, quis rutrum turpis. Montes netus nibh ornare potenti quam$
class. Natoque nec proin sapien augue curae, elementum.</p>
如示例所示,<p>
标签之间可以有其他标签.所以我正在寻找一个正则表达式来用空格替换所有这些回车,但不要触及 <p>
标签之外的任何回车.
As the example shows, there can be other tags inbetween the <p>
tags. So I'm looking for a regex to replace all these carriage returns with spaces but not touch any carriage returns outside the <p>
tags.
非常感谢任何帮助.谢谢!
Any help is greatly appreciated. Thanks!
推荐答案
[\r\n]+(?=(?:[^<]+|<(?!/?p\b))*</p>)
第一部分匹配一个或多个任何类型的行分隔符( 标记,则匹配失败. The first part matches one or more of any kind of line separator ( 请注意,这个正则表达式很容易被欺骗,例如被 SGML 注释、 Note that this regex can be fooled very easily, for example by SGML comments, 这篇关于正则表达式删除 <p> 之间的回车符标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!\n
、\r\n
或 \r
).其余的是前瞻,尝试将所有内容匹配到下一个 </p>
结束标记,但如果它首先找到了一个开始 \n
, \r\n
, or \r
). The rest is a lookahead that attempts to match everything up to the next closing </p>
tag, but if it finds an opening <p>
tag first, the match fails. 元素或普通的旧格式错误的 HTML.另外,我假设您的正则表达式风格支持正面和负面的前瞻.现在这是一个非常安全的假设,但如果正则表达式对您不起作用,我们需要确切地知道您使用的是哪种语言或工具.
<script>
elements, or plain old malformed HTML. Also, I'm assuming your regex flavor supports positive and negative lookaheads. That's a pretty safe assumption these days, but if the regex doesn't work for you, we'll need to know exactly which language or tool you're using.