论坛标签。什么是实现它们的最好方法? [英] Forum tags. What is the best way to implement them?

查看:236
本文介绍了论坛标签。什么是实现它们的最好方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我建立一个论坛,我想用论坛风格的标记,让用户以有限的方式格式化自己的岗位。目前我使用正则表达式来做到这一点。根据这个问题:<一href=\"http://stackoverflow.com/questions/4909283/how-to-use-c-regular-ex$p$pssions-to-emulate-forum-tags\">How使用C#普通EX pressions效仿论坛标签。

I am building a forum and I want to use forum-style tags to let the users format their posts in a limited fashion. Currently I am using Regex to do this. As per this question: How to use C# regular expressions to emulate forum tags.

这里的问题是,正则表达式不嵌套的标签加以区分。下面是我如何实现这种方法的例子:

The problem with this, is that the regex does not distinguish between nested tags. Here is a sample of how I implemented this method:

    public static string MyExtensionMethod(this string text)
    {
         return TransformTags(text);
    }

    private static string TransformTags(string input)
    {
        string regex = @"\[([^=]+)[=\x22']*(\S*?)['\x22]*\](.+?)\[/(\1)\]";
        MatchCollection matches = new Regex(regex).Matches(input);
        for (int i = 0; i < matches.Count; i++)
        {
            var tag = matches[i].Groups[1].Value;
            var optionalValue = matches[i].Groups[2].Value;
            var content = matches[i].Groups[3].Value;

            if (Regex.IsMatch(content, regex))
            {
                content = TransformTags(content);
            }

            content = HandleTags(content, optionalValue, tag);

            input = input.Replace(matches[i].Groups[0].Value, content);
        }

        return input;
    }

    private static string HandleTags(string content, string optionalValue, string tag)
    {
        switch (tag.ToLower())
        {
            case "quote":
                return string.Format("<div class='quote'>{0}</div>", content);
            default:
                return string.Empty;
        }
    }

现在,如果我提交类似 [quote]这个用户发布了[quote]等等[/报价] [/报价] 它无法正确检测到嵌套的报价。相反,它采用第一个开引号标记,并与第一右引号标记如是说。

Now, if I submit something like [quote] This user posted [quote] blah [/quote] [/quote] it does not properly detect the nested quote. Instead it takes the first opening quote tag and puts it with the first closing quote tag.

难道你们推荐任何解决方案?可正则表达式进行修改,以抢嵌套的标签?也许我不应该为这个使用正则表达式?任何帮助是AP preciated。

Do you guys recommend any solutions? Can the regex be modified to grab nested tags? Maybe I shouldn't use regex for this? Any help is appreciated.

推荐答案

在使用唯一的正则表达式可能是可以使用均衡组,这是pretty重巫术,它的intrinsecally脆弱。我建议使用正则表达式查找打开/关闭标签(不尝试与开放紧密联系起来),标记和收集他们在集合中(堆栈可能),然后选择手动解析它们(在foreach)的。通过这种方式,你有最好的两个世界:标签用正则表达式搜索和手工处理他们(和错别字的人的)的

While using "only" regex is probably possible using balancing groups, it's pretty heavy voodoo, and it's intrinsecally "fragile". What I propose is using regexes to find open/close tags (without trying to associate the close with the open), mark and collect them in a collection (a stack probably) and then parse them "by hand" (with a foreach). In this way you have the best of both world: the searching of tags by regex and the handling of them (and of wrongly written ones) by hand.

class TagMatch
{
    public string Tag { get; set; }
    public Capture Capture { get; set; }
    public readonly List<string> Substrings = new List<string>();
}

static void Main(string[] args)
{
    var rx = new Regex(@"(?<OPEN>\[[A-Za-z]+?\])|(?<CLOSE>\[/[A-Za-z]+?\])|(?<TEXT>[^\[]+|\[)");
    var str = "Lorem [AA]ipsum [BB]dolor sit [/BB]amet, [ consectetur ][/AA]adipisici elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquid ex ea commodi consequat. Quis aute iure reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint obcaecat cupiditat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.";
    var matches = rx.Matches(str);

    var recurse = new Stack<TagMatch>();
    recurse.Push(new TagMatch { Tag = String.Empty });

    foreach (Match match in matches)
    {
        var text = match.Groups["TEXT"];

        TagMatch last;

        if (text.Success)
        {
            last = recurse.Peek();
            last.Substrings.Add(text.Value);
            continue;
        }

        var open = match.Groups["OPEN"];

        string tag;

        if (open.Success)
        {
            tag = open.Value.Substring(1, open.Value.Length - 2);
            recurse.Push(new TagMatch { Tag = tag, Capture = open.Captures[0] });
            continue;
        }

        var close = match.Groups["CLOSE"];

        tag = close.Value.Substring(2, close.Value.Length - 3);

        last = recurse.Peek();

        if (last.Tag == tag)
        {
            recurse.Pop();

            var lastLast = recurse.Peek();
            lastLast.Substrings.Add("**" + last.Tag + "**");
            lastLast.Substrings.AddRange(last.Substrings);
            lastLast.Substrings.Add("**/" + last.Tag + "**");
        }
        else
        {
            throw new Exception();
        }
    }

    if (recurse.Count != 1)
    {
        throw new Exception();
    }

    var sb = new StringBuilder();
    foreach (var str2 in recurse.Pop().Substrings)
    {
        sb.Append(str2);
    }

    var str3 = sb.ToString();
}

这是一个例子。它是区分大小写(但很容易解决这个问题)。它不处理未成的标签,因为有不同的方式来处理它们。你在哪里找到一个抛出新的异常你必须添加处理。显然,这不是一个下降的解决方案。这只是一个例子。按这个逻辑,我不会要的问题,如编译器告诉我,我需要一个命名空间或编译器无法找到正则表达式作出回应。但我会更比高兴地回应先进的问题,怎么样能够配对的标签相匹配,或者你怎么能增加支持 [AAA = BBB] 标签

This is an example. It's case sensitive (but it's easy to solve this problem). It doesn't handle "unpaired" tags, because there are various ways to handle them. Where you find a "throw new Exception" you'll have to add your handling. Clearly this isn't a "drop in" solution. It's only an example. By this logic, I won't respond to questions like "the compiler tells me I need a namespace" or "the compiler can't find Regex". BUT I will be more-than-happy to respond to "advanced" questions, like how could unpaired tags be matched, or how could you add support for [AAA=bbb] tags

(第二大EDIT)

Bwahahahah!我没有知道分组是做的方式!

Bwahahahah! I DID know groupings were the way to do it!

// Some classes

class BaseTagMatch {
    public Capture Capture;

    public override string ToString()
    {
        return String.Format("{1}: {2} [{0}]", GetType(), Capture.Index, Capture.Value.ToString());
    }
}

class BeginTag : BaseTagMatch
{
    public int Index;
    public Capture Options;
    public EndTag EndTag;
}

class EndTag : BaseTagMatch {
    public int Index;
    public BeginTag BeginTag;
}

class Text : BaseTagMatch
{
}

class UnmatchedTag : BaseTagMatch
{
}

// The code

var pattern =
    @"(?# line 01) ^" +
    @"(?# line 02) (" +
    // Non [ Text
    @"(?# line 03)   (?>(?<TEXT>[^\[]+))" +
    @"(?# line 04)   |" +
    // Immediately closed tag [a/]
    @"(?# line 05)   (?>\[  (?<TAG>  [A-Z]+  )  \x20*  =?  \x20*  (?<TAG_OPTION>(  (?<=  =  \x20*)  (  (?!  \x20*  /\])  [^\[\]\r\n]  )*  )?  )  (?<BEGIN_INNER_TEXT>)(?<END_INNER_TEXT>)  \x20*  /\]  )" +
    @"(?# line 06)   |" +
    // Matched open tag [a]
    @"(?# line 07)   \[  (?<TAG>  (?<OPEN>  [A-Z]+  )  )  \x20* =?  \x20* (?<TAG_OPTION>(  (?<=  =  \x20*)  (  (?!  \x20*  \])  [^\[\]\r\n]  )*  )?  )  \x20*  \]  (?<BEGIN_INNER_TEXT>)" +
    @"(?# line 08)   |" +
    // Matched close tag [/a]
    @"(?# line 09)   (?>(?<END_INNER_TEXT>)  \[/  \k<OPEN>  \x20*  \]  (?<-OPEN>))" +
    @"(?# line 10)   |" +
    // Unmatched open tag [a]
    @"(?# line 11)   (?>(?<UNMATCHED_TAG>  \[  [A-Z]+  \x20* =?  \x20* (  (?<=  =  \x20*)  (  (?!  \x20*  \])  [^\[\]\r\n]  )*  )?  \x20*  \]  )  )" +
    @"(?# line 12)   |" +
    // Unmatched close tag [/a]
    @"(?# line 13)   (?>(?<UNMATCHED_TAG>  \[/  [A-Z]+  \x20*  \]  )  )" +
    @"(?# line 14)   |" +
    // Single [ of Text (unmatched by other patterns)
    @"(?# line 15)   (?>(?<TEXT>\[))" +
    @"(?# line 16) )*" +
    @"(?# line 17) (?(OPEN)(?!))" +
    @"(?# line 18) $";

var rx = new Regex(pattern, RegexOptions.IgnorePatternWhitespace | RegexOptions.ExplicitCapture | RegexOptions.IgnoreCase);

var match = rx.Match("[div=c:max max]asdf[p = 1   ] a [p=2] [b  =  p/pp   /] [q/] \n[a]sd [/z]  [ [/p]f[/p]asdffds[/DIV] [p][/p]");

////var tags = match.Groups["TAG"].Captures.OfType<Capture>().ToArray();
////var tagoptions = match.Groups["TAG_OPTION"].Captures.OfType<Capture>().ToArray();
////var begininnertext = match.Groups["BEGIN_INNER_TEXT"].Captures.OfType<Capture>().ToArray();
////var endinnertext = match.Groups["END_INNER_TEXT"].Captures.OfType<Capture>().ToArray();
////var text = match.Groups["TEXT"].Captures.OfType<Capture>().ToArray();
////var unmatchedtag = match.Groups["UNMATCHED_TAG"].Captures.OfType<Capture>().ToArray();

var tags = match.Groups["TAG"].Captures.OfType<Capture>().Select((p, ix) => new BeginTag { Capture = p, Index = ix, Options = match.Groups["TAG_OPTION"].Captures[ix] }).ToList();

Func<Capture, int, EndTag> func = (p, ix) =>
{
    var temp = new EndTag { Capture = p, Index = ix, BeginTag = tags[ix] };
    tags[ix].EndTag = temp;
    return temp;
};

var endTags = match.Groups["END_INNER_TEXT"].Captures.OfType<Capture>().Select((p, ix) => func(p, ix));
var text = match.Groups["TEXT"].Captures.OfType<Capture>().Select((p, ix) => new Text { Capture = p });
var unmatchedTags = match.Groups["UNMATCHED_TAG"].Captures.OfType<Capture>().Select((p, ix) => new UnmatchedTag { Capture = p });

// Here you have all the tags and the inner text neatly ordered and ready to be recomposed in a StringBuilder.
var allTags = tags.Cast<BaseTagMatch>().Union(endTags).Union(text).Union(unmatchedTags).ToList();
allTags.Sort((p, q) => p.Capture.Index - q.Capture.Index);

foreach (var el in allTags)
{
    var type = el.GetType();

    if (type == typeof(BeginTag))
    {

    }
    else if (type == typeof(EndTag))
    {

    }
    else if (type == typeof(UnmatchedTag))
    {

    }
    else
    {
        // Text
    }
}

不区分大小写的标签匹配,忽略标签无法正常关闭,支持立即关闭标记( [BR /] )。并有人告诉它WASN 'T可能与正则表达式.... Bwahahahahah!

Case insensitive tag matching, ignores tags not correctly closed, supports immediately closed tags ([BR/]). And someone told it wasn't possible with Regex.... Bwahahahahah!

标记 TAGOPTION BEGIN_INNER_TEXT END_INNER_TEXT 匹配(他们总是有相同数量的元素)。 TEXT UNMATCHED_TAG 不匹配! 标记 TAG_OPTION 是自动explicative(两者都是无用剥离空间)。 BEGIN_INNER_TEXT END_INNER_TEXT 捕获总是空的,但你可以用它们的首页属性,看看那里的标签开始/结束。 UNMATCHED_TAG 包含已打开,但不能关闭,或者关闭,但不opponed标签。它不包含错了格式标签(例如[ 123 ])。

TAG, TAGOPTION, BEGIN_INNER_TEXT and END_INNER_TEXT are matched (they always have the same number of elements). TEXT and UNMATCHED_TAG AREN'T matched! TAG and TAG_OPTION are auto-explicative (both are stripped of useless spaces). BEGIN_INNER_TEXT and END_INNER_TEXT captures are always empty, but you can use their Index property to see where the tags begin/end. UNMATCHED_TAG contains the tags that have been opened but not closed, or closed but not opponed. It doesn't contain tags that are wrong in format (for example [123]).

在最后,我把标记 END_INNER_TEXT (上看到标签结束),文本 UNMATCHED_TAG 和指标进行排序。然后,你可以把 allTags ,把它放在一个的foreach 并为每个元素测试它的类型。易:-): - )

In the end I take the TAG, END_INNER_TEXT (to see where the tags end), TEXT and UNMATCHED_TAG and sort them by index. Then you can take the allTags, put it in a foreach and for each element test its type. Easy :-) :-)

作为一个小纸条,正则表达式是 RegexOptions.IgnorePatternWhitespace | RegexOptions.ExplicitCapture | RegexOptions.IgnoreCase 。前两个是使它更容易编写和阅读,第三个是semanthical。它使 [A] 匹配 [/ A]

As a small note, the Regex is RegexOptions.IgnorePatternWhitespace | RegexOptions.ExplicitCapture | RegexOptions.IgnoreCase. The first two are to make it easier to write and to read, the third one is semanthical. It makes [A] match with [/a].

必要读数:

HTTP://www.$c$cproject.com/KB/食谱/ Nested_RegEx_explained.aspx
HTTP://www.$c$cproject.com/KB/recipes/RegEx_Balanced_Grouping的.aspx

这篇关于论坛标签。什么是实现它们的最好方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆