从整个HTML,但与常规的前pressions pre内删除空白 [英] Remove white space from entire Html but inside pre with regular expressions

查看:187
本文介绍了从整个HTML,但与常规的前pressions pre内删除空白的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在ASP.NET MVC 3,我创建了一个行为过滤器从整个HTML白色拆装空间。它的工作原理如我所料的大部分时间,但现在我需要改变正则表达式为了不碰里面的 pre 元素。

我从真棒的Mads克里斯滕森的博客,正则表达式的逻辑和我不知道如何修改它这个目的。

下面是逻辑:

 公共覆盖无效写入(字节[]缓冲区,诠释抵消,诠释计数){    字符串HTML = Encoding.UTF8.GetString(缓冲区,偏移数);    正则表达式章=新Regex(@\"(?<=[^])\\t{2,}|(?<=[>])\\s{2,}(?=[<])|(?<=[>])\\s{2,11}(?=[<])|(?=[\
])\\s{2,}\");
    HTML = reg.Replace(HTML,的String.Empty);    缓冲= System.Text.Encoding.UTF8.GetBytes(HTML);
    this.Base.Write(缓冲液,0,buffer.Length);
}

过滤器的整体code:

<一个href=\"https://github.com/tugberkugurlu/MvcBloggy/blob/master/src/MvcBloggy.Web/Application/ActionFilters/RemoveWhitespacesAttribute.cs\" rel=\"nofollow\">https://github.com/tugberkugurlu/MvcBloggy/blob/master/src/MvcBloggy.Web/Application/ActionFilters/RemoveWhitespacesAttribute.cs

任何想法?

编辑:

BIG注:


  

我的意图是完全没有的加快响应时间即可。事实上,
  这也许会减慢速度。我GZiped的页面,这微小让我
  获得约4 - 每这是什么第5页KB



解决方案

解析HTML与正则表达式非常复杂,任何简单的解决方案可以打破容易。 (使用正确的工具的工作。)话虽这么说,我将展示一个简单的解决方案。

首先,我简单,你必须在正则表达式:

 (?&LT; = \\ S)\\ S +

用一个空字符串替换那些比赛开始到处找双摆脱空间的。

假设没有&LT; &GT; 中的 pre在的结束 标签,你可以添加([^&LT;&GT;] *?&LT; / pre&GT)前pression使它失败 pre 标记中。这确保了&LT; / pre&GT; 不遵循电流的匹配,在不之间的任何标记。

致使

 (?&LT; = \\ S)\\ S +(?![^&LT;&GT;] *&LT; / pre&GT;)

On ASP.NET MVC 3, I created a Action Filter for white space removal from the entire html. It works as I expected most of the time but now I need to change the RegEx in order not to touch inside pre element.

I get the RegEx logic from awesome Mads Kristensen's blog and I am not sure how to modify it for this purpose.

Here is the logic:

public override void Write(byte[] buffer, int offset, int count) {

    string HTML = Encoding.UTF8.GetString(buffer, offset, count);

    Regex reg = new Regex(@"(?<=[^])\t{2,}|(?<=[>])\s{2,}(?=[<])|(?<=[>])\s{2,11}(?=[<])|(?=[\n])\s{2,}");
    HTML = reg.Replace(HTML, string.Empty);

    buffer = System.Text.Encoding.UTF8.GetBytes(HTML);
    this.Base.Write(buffer, 0, buffer.Length);
}

Whole code of the filter:

https://github.com/tugberkugurlu/MvcBloggy/blob/master/src/MvcBloggy.Web/Application/ActionFilters/RemoveWhitespacesAttribute.cs

Any idea?

EDIT:

BIG NOTE:

My intention is totally not speed up the response time. In fact, maybe this slows things down. I GZiped the pages and this minification makes me gain approx 4 - 5 kb per page which is nothing.

解决方案

Parsing HTML with regex very complicated and any simple solutions could break easily. (Use the right tool for the job.) That being said I'll show a simple solution.

First I simplified the regex you had to:

(?<=\s)\s+

Replace those matches with an empty string to get rid of double spaces everywhere.

Assuming there are no < or > inside the pre tag, you can add (?![^<>]*</pre>) at the end of the expression to make it fail inside of pre tags. This makes sure that </pre> doesn't follow current match, without any tags in between.

Resulting in:

(?<=\s)\s+(?![^<>]*</pre>)

这篇关于从整个HTML,但与常规的前pressions pre内删除空白的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆