使用HTML Agility Pack以上下文相关的方式解析节点 [英] Using Html Agility Pack to parse nodes in a context sensitive fashion

查看:73
本文介绍了使用HTML Agility Pack以上下文相关的方式解析节点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

<div class="mvb"><b>Date 1</b></div>
<div class="mxb"><b>Header 1</b></div>
<div>
   inner hmtl 1
</div>

<div class="mvb"><b>Date 2</b></div>
<div class="mxb"><b>Header 2</b></div>
<div>
inner html 2
</div>

我希望以一种可以解析的方式来解析标签之间的内部html

I would like to parse the inner html between the tags in such a way that I can

    *将内部html 1与标头1和日期1相关联
    * associate the inner html 1 with header 1 and date 1
    *将内部html 2与标头2和日期2关联起来
    * associate the inner html 2 with header 2 and date 2

换句话说,当我解析内部html 1时,我想知道包含"Date 1"和"Header 1"的html节点已被解析(但是包含"Date 2"和"Header"的节点2"尚未解析)

In other words, at the time I parse the inner html 1 I would like to know that the html nodes containing "Date 1" and "Header 1" have been parsed (but the nodes containing "Date 2" and "Header 2" have not been parsed)

如果我通过常规的文本解析来执行此操作,则我将一次读取一行,并记录比解析后的最后一个日期"和标题".然后,当需要解析内部html 1时,我可以引用最后一个解析的"Date"和"Header"对象,将它们关联在一起.

If I were doing this via regular text parsing, I would read one line at a time and record the last "Date" and "Header" than I had parsed. Then when it came time to parse the inner html 1, I could refer to the last parsed "Date" and "Header" object to associate them together.

推荐答案

使用Html Agility Pack,您可以利用XPATH的强大功能-无需进行冗长的xlinq废话:-). XPATH position()函数是上下文相关的.这是示例代码:

Using the Html Agility Pack, you can leverage XPATH power - and forget about that verbose xlinq crap :-). The XPATH position() function is context sensitive. Here is a sample code:

    HtmlDocument doc = new HtmlDocument();
    doc.Load("your html file");

    // select all DIV without a CLASS attribute defined
    foreach (HtmlNode div in doc.DocumentNode.SelectNodes("//div[not(@class)]"))
    {
        Console.WriteLine("div=" + div.InnerText.Trim());
        Console.WriteLine("  header=" + div.SelectSingleNode("preceding-sibling::div[position()=1]/b").InnerText);
        Console.WriteLine("  date=" + div.SelectSingleNode("preceding-sibling::div[position()=2]/b").InnerText);
    }

这将与您的样品一起使用:

That will prrint this with your sample:

div=inner hmtl 1
  header=Header 1
  date=Date 1
div=inner html 2
  header=Header 2
  date=Date 2

这篇关于使用HTML Agility Pack以上下文相关的方式解析节点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆