使用HTMLAgilityPack进行XHTML解析 [英] XHTML Parsing with HTMLAgilityPack

查看:48
本文介绍了使用HTMLAgilityPack进行XHTML解析的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在使用HTMLAgilityPack找到的元素中有以下元素的列表.

I have a list of the following elements inside a element that I have found using HTMLAgilityPack.

<option value="67"><span style="color: #cc0000;">Horde</span> Leveling / Dailies & Event Guide ($50.00)</option>

我需要做的是从标记中解析所有文本,而其中没有所有的巨型字符.我已经尝试了(看似!)一切,但总会看起来像这样:

What I need to do is parse all the text out of the tag, without all the mumbo jumbo in there. I've tried (seemingly!) everything, but it always comes out looking like this:

部落
练级/日用&活动指南($ 50.00)

Horde
Leveling / Dailies & Event Guide ($50.00)

有时甚至是:

部落
调平
/日报活动指南($ 50.00)

Horde
Leveling
/ Dailies & Event Guide ($50.00)

以及其他一些类似的变体.我什至可以将字符串中的每个字符都打印为一个字节,而且还没有发现任何换行符或提要,只有我期望的换行符或提要,以及正常的字母和空格.这是html的完整参考源,直接从页面复制而来.

and a couple other variations like that. I've even gone so far as to print out each character in the string as a byte, and I haven't found any linebreaks or feeds, only what I expected, normal letters and spaces. Here's the full source of the html for reference, copied straight from the page.

<option value="13"><span style="color: #0000ff;">Alliance</span> Leveling Guide ($30.00)</option>


<option value="12"><span style="color: #cc0000;">Horde</span> Leveling Guide ($30.00)</option>

<option value="46"><span style="color: #cc0000;">Horde</span> Dailies & Events Guide ($25.00)</option>

<option value="67"><span style="color: #cc0000;">Horde</span> Leveling / Dailies & Event Guide ($50.00)</option>


<option value="11"><span style="color: #0000ff;">Alliance</span> &amp; <span style="color: #cc0000;">Horde</span> Leveling Guide ($50.00)</option>

<option value="97"><span style="color: #0000ff;">Alliance</span> Achievements & Professions Guide ($20.00)</option>

<option value="98"><span style="color: #cc0000;">Horde</span> Achievements & Professions Guide ($20.00)</option>


<option value="99"><span style="color: #0000ff;">Alliance</span> &amp; <span style="color: #cc0000;">Horde</span> Achievements & Professions Guide ($30.00)</option>

推荐答案

默认情况下,Html Agility Pack将<OPTION>标记视为空",这意味着它不需要结束</OPTION>,即为什么在这种情况下,使用XPATH并不容易.您可以使用HtmlNode.ElementFlags集合更改此设置.

By default, the <OPTION> tag is treated by Html Agility Pack as a "Empty", which means it does not need a closing </OPTION>, that's why in this case, it's not easy to catch with XPATH. You can change this using the HtmlNode.ElementFlags collection.

以下是应该执行您想要的代码:

Here is a code that should do what you want:

HtmlDocument doc = new HtmlDocument();
HtmlNode.ElementsFlags.Remove("option");
doc.LoadHtml(yourHtml);
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//option"))
{
    Console.WriteLine(node.InnerText);
}

这篇关于使用HTMLAgilityPack进行XHTML解析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆