从整个HTML,但与常规的前pressions pre内删除空白 [英] Remove white space from entire Html but inside pre with regular expressions
问题描述
在ASP.NET MVC 3,我创建了一个行为过滤器从整个HTML白色拆装空间。它的工作原理如我所料的大部分时间,但现在我需要改变正则表达式为了不碰里面的 pre
元素。
我从真棒的Mads克里斯滕森的博客,正则表达式的逻辑和我不知道如何修改它这个目的。
下面是逻辑:
公共覆盖无效写入(字节[]缓冲区,诠释抵消,诠释计数){ 字符串HTML = Encoding.UTF8.GetString(缓冲区,偏移数); 正则表达式章=新Regex(@\"(?<=[^])\\t{2,}|(?<=[>])\\s{2,}(?=[<])|(?<=[>])\\s{2,11}(?=[<])|(?=[\
])\\s{2,}\");
HTML = reg.Replace(HTML,的String.Empty); 缓冲= System.Text.Encoding.UTF8.GetBytes(HTML);
this.Base.Write(缓冲液,0,buffer.Length);
}
过滤器的整体code:
<一个href=\"https://github.com/tugberkugurlu/MvcBloggy/blob/master/src/MvcBloggy.Web/Application/ActionFilters/RemoveWhitespacesAttribute.cs\" rel=\"nofollow\">https://github.com/tugberkugurlu/MvcBloggy/blob/master/src/MvcBloggy.Web/Application/ActionFilters/RemoveWhitespacesAttribute.cs
任何想法?
编辑:
BIG注:
我的意图是完全没有的加快响应时间即可。事实上,
这也许会减慢速度。我GZiped的页面,这微小让我
获得约4 - 每这是什么第5页KB
块引用>解决方案解析HTML与正则表达式非常复杂,任何简单的解决方案可以打破容易。 (使用正确的工具的工作。)话虽这么说,我将展示一个简单的解决方案。
首先,我简单,你必须在正则表达式:
(?&LT; = \\ S)\\ S +
用一个空字符串替换那些比赛开始到处找双摆脱空间的。
假设没有
&LT;
或&GT;
中的pre在的结束
标签,你可以添加([^&LT;&GT;] *?&LT; / pre&GT)前pression使它失败
pre
标记中。这确保了&LT; / pre&GT;
不遵循电流的匹配,在不之间的任何标记。致使
(?&LT; = \\ S)\\ S +(?![^&LT;&GT;] *&LT; / pre&GT;)
On ASP.NET MVC 3, I created a Action Filter for white space removal from the entire html. It works as I expected most of the time but now I need to change the RegEx in order not to touch inside
pre
element.I get the RegEx logic from awesome Mads Kristensen's blog and I am not sure how to modify it for this purpose.
Here is the logic:
public override void Write(byte[] buffer, int offset, int count) { string HTML = Encoding.UTF8.GetString(buffer, offset, count); Regex reg = new Regex(@"(?<=[^])\t{2,}|(?<=[>])\s{2,}(?=[<])|(?<=[>])\s{2,11}(?=[<])|(?=[\n])\s{2,}"); HTML = reg.Replace(HTML, string.Empty); buffer = System.Text.Encoding.UTF8.GetBytes(HTML); this.Base.Write(buffer, 0, buffer.Length); }
Whole code of the filter:
Any idea?
EDIT:
BIG NOTE:
My intention is totally not speed up the response time. In fact, maybe this slows things down. I GZiped the pages and this minification makes me gain approx 4 - 5 kb per page which is nothing.
解决方案Parsing HTML with regex very complicated and any simple solutions could break easily. (Use the right tool for the job.) That being said I'll show a simple solution.
First I simplified the regex you had to:
(?<=\s)\s+
Replace those matches with an empty string to get rid of double spaces everywhere.
Assuming there are no
<
or>
inside thepre
tag, you can add(?![^<>]*</pre>)
at the end of the expression to make it fail inside ofpre
tags. This makes sure that</pre>
doesn't follow current match, without any tags in between.Resulting in:
(?<=\s)\s+(?![^<>]*</pre>)
这篇关于从整个HTML,但与常规的前pressions pre内删除空白的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!