如何使用Agility Pack从HTML剥离注释而不丢失DOCTYPE [英] How to strip comments from HTML using Agility Pack without losing DOCTYPE
本文介绍了如何使用Agility Pack从HTML剥离注释而不丢失DOCTYPE的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试从HTML中删除不必要的内容.具体来说,我想删除评论.我找到了一个很好的解决方案(使用HTML Agility Pack ),但是DOCTYPE被视为注释,因此与注释一起被删除.如何改善下面的代码以确保保留DOCTYPE?
I am trying to remove unnecessary content from HTML. Specifically I want to remove comments. I found a pretty good solution (Grabbing meta-tags and comments using HTML Agility Pack) however the DOCTYPE is treated as a comment and therefore removed along with the comments. How can I improve the code below to make sure the DOCTYPE is preserved?
var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(htmlContent);
var nodes = htmlDoc.DocumentNode.SelectNodes("//comment()");
if (nodes != null)
{
foreach (HtmlNode comment in nodes)
{
comment.ParentNode.RemoveChild(comment);
}
}
推荐答案
检查注释是否不以DOCTYPE
foreach (var comment in nodes)
{
if (!comment.InnerText.StartsWith("DOCTYPE"))
comment.ParentNode.RemoveChild(comment);
}
这篇关于如何使用Agility Pack从HTML剥离注释而不丢失DOCTYPE的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文