如何从 CDATA 中删除 href 标签 [英] How to remove href tag from CDATA

查看:21
本文介绍了如何从 CDATA 中删除 href 标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在xml文档中有以下CDATA:

I have following CDATA inside xml document:

<![CDATA[ <p xmlns="">Refer to the below: <br/>
</p>
<table xmlns:abc="http://google.com pic.xsd" cellspacing="1" class="c" type="custom" width="100%">
    <tbody>
        <tr xmlns="">            
            <th style="text-align: left">Basic offers...</th>
        </tr>
        <tr xmlns="">
            <td style="text-align: left">Faster network</td>
            <td style="text-align: left">
            <ul>                
                <li>Session</li>
            </ul>
            </td>
        </tr>
        <tr xmlns="">
            <td style="text-align: left">capabilities</td>
            <td style="text-align: left">
            <ul>                
                <li>Navigation,</li>
                <li>message, and</li>
                <li>contacts</li>
            </ul>
            </td>
        </tr>
        <tr xmlns="">
            <td style="text-align: left">Data</td>
            <td style="text-align: left">
            <p>Here visit google for more info <a href="http://www.google.com" target="_blank"><font color="#0033cc">www.google.com</font></a>.</p>
            <p>Remove this href tag <a href="/abc/def/{T}/t/1" target="_blank">Information</a> remove the tag.</p>
            </td>
        </tr>
    </tbody>
</table>
<p xmlns=""><br/>
</p>
  ]]> 

我想知道如何扫描 href="/abc/def 并删除以 abc/def 开头的 href 标签.在​​上面的示例中,删除 href 标签并只在标签内留下信息"文本.CDATA 可以有不止一个带有abc/def..."的href标签.我正在为此应用程序使用 C#.有人可以帮助我并告诉我如何做到这一点吗?我应该使用正则表达式还是有办法用 xml 本身来做?

I want to some how scan for href="/abc/def and remove the href tag which starts with abc/def. In above example remove the href tag and just leave "Information" text inside the tag. CDATA can have more than one href tags with "abc/def... in it. I am using C# for this application. Can someone please help me and tell me how this can be done? Should i use regex or is there a way to do it with xml itself?

这是我正在尝试的正则表达式:

This is the regex i am trying:

"<a href=\"/abc/def/.*></a>"

我想保留 a href 标签的内部文本,只需删除标签即可.但上面的正则表达式不起作用.

I want to keep inner text of the a href tag just remove the tags. But above regex is not working.

推荐答案

使用 HtmlAgilityPack

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);

var nodes = doc.DocumentNode
    .Descendants("a")
    .Where(n => n.Attributes.Any(a => a.Name == "href" && a.Value.StartsWith("/abc/def")))
    .ToArray();

foreach(var node in nodes)
{
    node.ParentNode.RemoveChild(node,true);
}

var newHtml = doc.DocumentNode.InnerHtml;

这篇关于如何从 CDATA 中删除 href 标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆