修剪HTML内容的空白? [英] Trimming whitespace from HTML content?

查看:112
本文介绍了修剪HTML内容的空白?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有自定义富文本编辑器控件(实际上是FCKEditor)的CRUD维护屏幕,程序从控件中提取格式化文本作为HTML保存到数据库。然而,我们的部分标准是,在保存之前需要从内容中剥离前导空白和尾随空白,因此我必须删除多余的<和< br>从HTML字符串的开头和结尾开始。



我可以选择在客户端(使用Javascript)或服务器端(使用Java )有没有简单的方法来做到这一点,使用正则表达式或什么?我不知道它有多复杂,我需要能够删除这样的内容:

 < p> < br /> &安培; NBSP;< / p为H. 

如果两者之间有任何有意义的文本,仍然保留它。 (以上片段来自测试人员保存的实际HTML数据)

p为H.?(?:< br\s * \ /> |&安培; [#\w] {2,6-}; | [\s\\\
\r])*< \\ \\ / p> / g

这应该与所有不包含任何有意义的文本 。

尽管在服务器端可能最好。


I have a CRUD maintenance screen with a custom rich text editor control (FCKEditor actually) and the program extracts the formatted text as HTML from the control for saving to the database. However, part of our standards is that leading and trailing whitespace needs to be stripped from the content before saving, so I have to remove extraneous &nbsp; and <br> and such from the beginning and end of the HTML string.

I can opt to either do it on the client side (using Javascript) or on the server side (using Java) Is there an easy way to do this, using regular expressions or something? I'm not sure how complex it needs to be, I need to be able to remove stuff like:

<p><br /> &nbsp;</p>

yet retain it if there's any kind of meaningful text in between. (Above snippet is from actual HTML data saved by the tester)

解决方案

/<p>(?:<br\s*\/>|&[#\w]{2,6};|[\s\n\r])*?<\/p>/g

That should match all paragraphs that don't contain any "meaningful text".

It's probably best to do it on the server-side though.

这篇关于修剪HTML内容的空白?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆