在空白过滤正则表达式的问题(这很简单,只需加入少量的需要) [英] Question on Whitespace Filter Regex (it's simple, just a small addition needed)

查看:102
本文介绍了在空白过滤正则表达式的问题(这很简单,只需加入少量的需要)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个ASP.NET MVC应用程序基于正则表达式过滤器的空白,它完美的作品,也非常完美。其中一个是被过滤的东西都是\\ r \\ n个字符。这实际上使一切都在源$ C ​​$ C,这是我的爱,因为我没有处理,因为空白的离奇CSS的一条线,但在某些情况下,我需要留住他们。一个例子是,当我想literraly用它换行,例如附注显示文本。

要做到这一点,我显然把它包在< pre>< / pre> 标签,但由于过滤器中的换行符在标签也之间的文本被擦洗,所以它是一个注:例如,而难以阅读。

任何人都可以用正则表达式的知识(我的是很可怜的......)帮我修改当前正则表达式忽略℃之间的文本,pre> 标签?

下面是目前的code:

 公共类WhitespaceFilter:MemoryStream的{
    私人字符串源=的String.Empty;
    私人流过滤器= NULL;    公共WhitespaceFilter(HTT presponseBase的Htt presponseBase){
        过滤= Htt的presponseBase.Filter;
    }    公共覆盖无效写入(字节[]缓冲区,诠释抵消,诠释计数){
        来源= UTF8Encoding.UTF8.GetString(缓冲);        来源=新的正则表达式(\\\\ T,RegexOptions.Compiled | RegexOptions.Multiline).Replace(源的String.Empty);
        来源=新的正则表达式(> \\\\ \\\\ - [R N'LT;,RegexOptions.Compiled | RegexOptions.Multiline).Replace(来源:><);
        来源=新的正则表达式(\\\\ \\\\řN,RegexOptions.Compiled | RegexOptions.Multiline).Replace(源的String.Empty);        而(新正则表达式(,RegexOptions.Compiled | RegexOptions.Multiline).IsMatch(来源)){
            来源=新的正则表达式(,RegexOptions.Compiled | RegexOptions.Multiline).Replace(源的String.Empty);
        };        来源=新的正则表达式(> \\\\ S<,RegexOptions.Compiled | RegexOptions.Multiline).Replace(来源:><);
        来源=新的正则表达式(< - * - >!?,RegexOptions.Compiled | RegexOptions.Singleline).Replace(源的String.Empty);        Filter.Write(UTF8Encoding.UTF8.GetBytes(来源),偏移,UTF8Encoding.UTF8.GetByteCount(来源));
    }
}

在此先感谢!


解决方案

有喜欢的 htmlcom pressor ​​已经在那里脱光空白。而像exhuma说,如果这是网络优化,然后COM的gzip pression将帮助更多的比什么,如果你将其配置在Web服务器上。

至于你原来的问题,有很多不同的方式来做到这一点。你也可以用类似XPATH攻击问题(如HTML符合XHTML),然后再加上正则表达式。不过,我想我会尽我的手在写一个正则表达式来做到这一点:

<$p$p><$c$c>(<$p$p>[^<>]*(((?<Open><)[^<>]*)+((?<Close-Open>>)[^<>]*)+)*(?(Open)(?!))</$p$p>)|[\
\\r]

这似乎工作。幸运的是.NET具有非常强大的正则表达式引擎,包括一个非常酷的均衡匹配功能。我无法解释它的任何比瑞安拜因顿能。但这个想法是先匹配的开头和结尾pre标签,并确保一切里面不变。然后围绕这些pre标签一切变得应用正则表达式的其余部分,[\\ n \\ r]。

为了使这项工作你只需做到这一点:

 源=新Regex(\"(<$p$p>[^<>]*(((?<Open><)[^<>]*)+((?<Close-Open>>)[^<>]*)+)*(?(Open)(?!))</$p$p>)|[\
\\r]\",
  RegexOptions.Compiled | RegexOptions.Singleline).Replace(资料来源,$ 1);

请注意在最后的$ 1这是从pre标签内抓起的结果,并返回它们保持不变。该部分

在那之后写的另一条线,以取代\\ S \\ S +用一个空格。我的认为的应该工作pretty很好。

I have a Regex based whitespace filter on an ASP.NET MVC application, and it works perfectly, too perfectly. One of the things that gets filtered are the \r\n characters. This effectively makes everything in one line of source code, which I love because I don't have to deal with quirky CSS because of the whitespace, but in certain instances I need to retain them. One example is when I want to literraly display text with line breaks in it, such as a note.

To do so, I would obviously wrap it in <pre></pre> tags, but because of the filter the linebreaks of text in between the tags also gets scrubbed, so it makes a note for example rather difficult to read.

Can anyone with Regex knowledge (mine is very poor...) help me in modifying the current Regex to ignore text between the <pre> tags?

Here's the current code:

public class WhitespaceFilter : MemoryStream {
    private string Source = string.Empty;
    private Stream Filter = null;

    public WhitespaceFilter(HttpResponseBase HttpResponseBase) {
        Filter = HttpResponseBase.Filter;
    }

    public override void Write(byte[] buffer, int offset, int count) {
        Source = UTF8Encoding.UTF8.GetString(buffer);

        Source = new Regex("\\t", RegexOptions.Compiled | RegexOptions.Multiline).Replace(Source, string.Empty);
        Source = new Regex(">\\r\\n<", RegexOptions.Compiled | RegexOptions.Multiline).Replace(Source, "><");
        Source = new Regex("\\r\\n", RegexOptions.Compiled | RegexOptions.Multiline).Replace(Source, string.Empty);

        while (new Regex("  ", RegexOptions.Compiled | RegexOptions.Multiline).IsMatch(Source)) {
            Source = new Regex("  ", RegexOptions.Compiled | RegexOptions.Multiline).Replace(Source, string.Empty);
        };

        Source = new Regex(">\\s<", RegexOptions.Compiled | RegexOptions.Multiline).Replace(Source, "><");
        Source = new Regex("<!--.*?-->", RegexOptions.Compiled | RegexOptions.Singleline).Replace(Source, string.Empty);

        Filter.Write(UTF8Encoding.UTF8.GetBytes(Source), offset, UTF8Encoding.UTF8.GetByteCount(Source));
    }
}

Thanks in advance!

解决方案

There are tools like htmlcompressor already out there to strip whitespace. And like exhuma said, if this is for web optimization then gzip compression would help more than anything if you configured it on the web server.

As for your original question, there a lot of different ways to do this. You could also attack the problem with something like XPATH (if the HTML is valid XHTML) and then combine that with regex. But I figured I'd try my hand at writing a single regex to do it:

(<pre>[^<>]*(((?<Open><)[^<>]*)+((?<Close-Open>>)[^<>]*)+)*(?(Open)(?!))</pre>)|[\n\r]

It seems to work for me. Fortunately .NET has an extremely powerful regex engine including a very cool balanced matching feature. I can't explain it any better than Ryan Byington can. But the idea is to match the beginning and ending pre tags first and make sure everything inside is untouched. Then everything around those pre tags gets the rest of the regex applied, "[\n\r]".

To make this work you'd simply do this:

Source = new Regex("(<pre>[^<>]*(((?<Open><)[^<>]*)+((?<Close-Open>>)[^<>]*)+)*(?(Open)(?!))</pre>)|[\n\r]", 
  RegexOptions.Compiled | RegexOptions.Singleline).Replace(Source, "$1");

Note the $1 at the end. This is the part that grabs the results from inside the pre tags and returns them untouched.

Then after that write another line to replace \s\s+ with a single space. I think that should work pretty well.

这篇关于在空白过滤正则表达式的问题(这很简单,只需加入少量的需要)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆