如何使用HTML Agility Pack和C#删除HTML源代码中的空格 [英] How do I remove whitespace in HTML Source with Html Agility Pack and C#

查看:131
本文介绍了如何使用HTML Agility Pack和C#删除HTML源代码中的空格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在发布之前,我尝试过此线程的解决方案:

Before posting I tried the solution from this thread:

C#-删除HTML源代码之间的空格标记?

以下是我正在使用的HTML的摘要:

Here is a snippet of the HTML I'm working with:

<p>This is my text</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>This is next text</p>

我正在使用HTML Agility Pack清理HTML:

I'm using HTML Agility Pack to clean up the HTML:

HtmlDocument doc = new HtmlDocument();
doc.Load(htmlLocation);
foreach (var item in doc.DocumentNode.Descendants("p").ToList())
{
    if (item.InnerHtml == "&nbsp;")
    {
        item.Remove();
    }
}

上面代码的输出是

<p>This is my text</p>





<p>This is next text</p>

所以我的问题是如何删除HTML源代码中两段之间的多余空格.

So my problem is how do I remove the extra whitespace between the two paragraphs in the HTML source.

推荐答案

删除第一段和最后一段之间的文本节点:

Remove the text nodes between the first and last paragraphs:

HTML:

var html = @"
<p>This is my text</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>This is next text</p>";

解析它:

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
var paragraphs = doc.DocumentNode.Descendants("p").ToList();
foreach (var item in paragraphs)
{
    if (item.InnerHtml == "&nbsp;") item.Remove();
}
var followingText = paragraphs[0]
    .SelectNodes(".//following-sibling::text()")
    .ToList();
foreach (var text in followingText) 
{
    text.Remove();
}

结果:

<p>This is my text</p><p>This is next text</p>

如果要在段落之间保持换行符,请使用for循环并在所有 last 文本节点上调用Remove().

If you want to keep the line break between the paragraphs, use a for loop and call Remove() on all except the last text node.

for (int i = 0; i < followingText.Count - 1; ++i)
{
    followingText[i].Remove();
}

结果:

<p>This is my text</p>
<p>This is next text</p>

这篇关于如何使用HTML Agility Pack和C#删除HTML源代码中的空格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆