如何使用Agility Pack从HTML剥离注释而不丢失DOCTYPE [英] How to strip comments from HTML using Agility Pack without losing DOCTYPE

查看:55
本文介绍了如何使用Agility Pack从HTML剥离注释而不丢失DOCTYPE的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从HTML中删除不必要的内容.具体来说,我想删除评论.我找到了一个很好的解决方案(使用HTML Agility Pack ),但是DOCTYPE被视为注释,因此与注释一起被删除.如何改善下面的代码以确保保留DOCTYPE?

I am trying to remove unnecessary content from HTML. Specifically I want to remove comments. I found a pretty good solution (Grabbing meta-tags and comments using HTML Agility Pack) however the DOCTYPE is treated as a comment and therefore removed along with the comments. How can I improve the code below to make sure the DOCTYPE is preserved?

var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(htmlContent);
var nodes = htmlDoc.DocumentNode.SelectNodes("//comment()");
if (nodes != null)
{
    foreach (HtmlNode comment in nodes)
    {
        comment.ParentNode.RemoveChild(comment);
    }
}

推荐答案

检查注释是否不以DOCTYPE

  foreach (var comment in nodes)
  {
     if (!comment.InnerText.StartsWith("DOCTYPE"))
         comment.ParentNode.RemoveChild(comment);
  }

这篇关于如何使用Agility Pack从HTML剥离注释而不丢失DOCTYPE的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆