使用Regex替换特定的HTML标签 [英] Replacing specific HTML tags using Regex

查看:117
本文介绍了使用Regex替换特定的HTML标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

好吧,对你们来说很简单。我们使用ActiveReport的RichTextBox来显示一些随机的HTML代码。



ActiveReport支持的HTML标签可以在这里找到: http://www.datadynamics.com/Help/ARNET3/ar3conSupportedHtmlTagsInRichText.html



我想要做的一个例子是将< div style =text-align:*< / div> code>< p style = \text-align:*< / p> ,以便使用受支持的标签进行文本对齐。

我发现以下正则表达式可以在我的html输入中找到正确的匹配:

 < div style = \text-align:(。*?)< / div> 

然而, ,我无法找到一种方法来保留在更换后标签中包含的以前的文本。任何线索?是我还是正则表达式通常是PITA?:)

  p rivate static readonly IDictionary< string,string> _replaceMap = 
新词典< string,string>
{< div style = \text-align:(。*?)< / div>,< p style = \text-align :(。 *?)< / p>}
};

public static string FormatHtml(string html)
{
foreach(var对在_replaceMap中)
{
html = Regex.Replace(html,pair .Key,pair.Value);
}

return html;

谢谢!

解决方案

使用 $ 1

  {< div style = \text-align:(。*?)< / div>,< p style = \text-align:$ 1< / p>} 

请注意,您可以将其简化为:

  {< div(style = \text-align:(?:。*?))< / div>,< p $ 1< / p> } 

另外,使用HTML解析器通常更好一些,比如 HtmlAgilityPack ,而不是试图用正则表达式来解析HTML,这是你如何做到的:

  HtmlDocument doc = new HtmlDocument(); 
doc.LoadHtml(html);
foreach(doc.DocumentNode.Descendants(div)中的var e)
e.Name =p;
doc.Save(Console.Out);

结果:

 


Alright, an easy one for you guys. We are using ActiveReport's RichTextBox to display some random bits of HTML code.

The HTML tags supported by ActiveReport can be found here : http://www.datadynamics.com/Help/ARNET3/ar3conSupportedHtmlTagsInRichText.html

An example of what I want to do is replace any match of <div style="text-align:*</div> by <p style=\"text-align:*</p> in order to use a supported tag for text-alignment.

I have found the following regex expression to find the correct match in my html input:

<div style=\"text-align:(.*?)</div>

However, I can't find a way to keep the previous text contained in the tags after my replacement. Any clue? Is it me or Regex are generally a PITA? :)

    private static readonly IDictionary<string, string> _replaceMap =
        new Dictionary<string, string>
            {
                {"<div style=\"text-align:(.*?)</div>", "<p style=\"text-align:(.*?)</p>"}
            };

    public static string FormatHtml(string html)
    {
        foreach(var pair in _replaceMap)
        {
            html = Regex.Replace(html, pair.Key, pair.Value);
        }

        return html;
    }

Thanks!

解决方案

Use $1:

{"<div style=\"text-align:(.*?)</div>", "<p style=\"text-align:$1</p>"}

Note that you could simplify this to:

{"<div (style=\"text-align:(?:.*?))</div>", "<p $1</p>"}

Also it is generally a better idea to use an HTML parser like HtmlAgilityPack than trying to parse HTML using regular expressions. Here's how you could do it:

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
foreach (var e in doc.DocumentNode.Descendants("div"))
    e.Name = "p";
doc.Save(Console.Out);

Result:

<p style="text-align:center">foo</p><p style="text-align:center">bar</p>

这篇关于使用Regex替换特定的HTML标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆