正则表达式将标记转换为HTML [英] Regular expression to convert mark down to HTML

查看:110
本文介绍了正则表达式将标记转换为HTML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您将如何编写正则表达式以将标记向下转换为HTML?例如,您将输入以下内容:

How would you write a regular expression to convert mark down into HTML? For example, you would type in the following:

This would be *italicized* text and this would be **bold** text

然后需要将其转换为:

This would be <em>italicized</em> text and this would be <strong>bold</strong> text

非常类似于stackoverflow使用的降标记编辑控件.

Very similar to the mark down edit control used by stackoverflow.

说明

对于它的价值,我正在使用C#.另外,这些是我要允许的真实标签/降价标签.转换的文本量将少于300个字符左右.

For what it is worth, I am using C#. Also, these are the only real tags/markdown that I want to allow. The amount of text being converted would be less than 300 characters or so.

推荐答案

最好的方法是找到Markdown库的版本,该版本移植到您正在使用的任何语言(您未在问题中指定).

The best way is to find a version of the Markdown library ported to whatever language you are using (you did not specify in your question).

现在,您已经澄清了只希望处理STRONG和EM,并且正在使用C#,建议您查看

Now that you have clarified that you only want STRONG and EM to be processed, and that you are using C#, I recommend you take a look at Markdown.NET to see how those tags are implemented. As you can see, it is in fact two expressions. Here is the code:

private string DoItalicsAndBold (string text)
{
    // <strong> must go first:
    text = Regex.Replace (text, @"(\*\*|__) (?=\S) (.+?[*_]*) (?<=\S) \1", 
                          new MatchEvaluator (BoldEvaluator),
                          RegexOptions.IgnorePatternWhitespace | RegexOptions.Singleline);

    // Then <em>:
    text = Regex.Replace (text, @"(\*|_) (?=\S) (.+?) (?<=\S) \1",
                          new MatchEvaluator (ItalicsEvaluator),
                          RegexOptions.IgnorePatternWhitespace | RegexOptions.Singleline);
    return text;
}

private string ItalicsEvaluator (Match match)
{
    return string.Format ("<em>{0}</em>", match.Groups[2].Value);
}

private string BoldEvaluator (Match match)
{
    return string.Format ("<strong>{0}</strong>", match.Groups[2].Value);
}

这篇关于正则表达式将标记转换为HTML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆