正则表达式删除 <p> 之间的回车符标签 [英] RegEx to remove carriage returns between <p> tags

查看:68
本文介绍了正则表达式删除 <p> 之间的回车符标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在努力弄清楚如何删除出现在

标签之间的回车.(从技术上讲,我需要用空格替换它们,而不是删除它们.)

I've stumped myself trying to figure out how to remove carriage returns that occur between <p> tags. (Technically I need to replace them with spaces, not remove them.)

这是一个例子.我使用美元符号 $ 作为回车标记.

Here's an example. I've used a dollar sign $ as a carriage return marker.

<p>Ac nec suspendisse est, dapibus.</strong> Nulla taciti curabitur enim hendrerit.$
Ante ornare phasellustellus vivamus dictumst dolor aliquam imperdiet lectus.$
Nisl nullam sodales,tincidunt dictum dui eget,gravida anno.Montes convallis$
adipiscing,aenean hac litora.Ridiculus,ut consequat curae,amet.Nostra$
菜豆类 interdum justo.Pharetra urna est hac laoreet, magna.$
Porttitor purus purus,quis rut​​rum turpis.Montes netus nibh ornare potenti quam$
班级.Natoque nec proin sapien augue curae, elementum.</p>

<p>Ac nec <strong>suspendisse est, dapibus.</strong> Nulla taciti curabitur enim hendrerit.$
Ante ornare phasellus tellus vivamus dictumst dolor aliquam imperdiet lectus.$
Nisl nullam sodales, tincidunt dictum dui eget, gravida anno. Montes convallis$
adipiscing, aenean hac litora. Ridiculus, ut consequat curae, amet. Nostra$
phasellus ridiculus class interdum justo. <em>Pharetra urna est hac</em> laoreet, magna.$
Porttitor purus purus, quis rutrum turpis. Montes netus nibh ornare potenti quam$
class. Natoque nec proin sapien augue curae, elementum.</p>

如示例所示,<p> 标签之间可以有其他标签.所以我正在寻找一个正则表达式来用空格替换所有这些回车,但不要触及 <p> 标签之外的任何回车.

As the example shows, there can be other tags inbetween the <p> tags. So I'm looking for a regex to replace all these carriage returns with spaces but not touch any carriage returns outside the <p> tags.

非常感谢任何帮助.谢谢!

Any help is greatly appreciated. Thanks!

推荐答案

[\r\n]+(?=(?:[^<]+|<(?!/?p\b))*</p>)

第一部分匹配一个或多个任何类型的行分隔符(\n\r\n\r).其余的是前瞻,尝试将所有内容匹配到下一个 </p> 结束标记,但如果它首先找到了一个开始 标记,则匹配失败.

The first part matches one or more of any kind of line separator (\n, \r\n, or \r). The rest is a lookahead that attempts to match everything up to the next closing </p> tag, but if it finds an opening <p> tag first, the match fails.

请注意,这个正则表达式很容易被欺骗,例如被 SGML 注释、

登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆