使用敏捷包解析HTML [英] Parsing html using agility pack

查看:129
本文介绍了使用敏捷包解析HTML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个要解析的html(见下文)

I have a html to parse(see below)

<div id="mailbox" class="div-w div-m-0">
    <h2 class="h-line">InBox</h2>
    <div id="mailbox-table">
        <table id="maillist">
            <tr>
                <th>From</th>
                <th>Subject</th>
                <th>Date</th>
            </tr>
            <tr onclick="location='readmail.html?mid=welcome'" style="font-weight: bold;">
                <td>no-reply@somemail.net</td>
                <td>
                    <a href="readmail.html?mid=welcome">Hi, Welcome</a>
                </td>
                <td>
                    <span title="2016-02-16 13:23:50 UTC">just now</span>
                </td>
            </tr>
            <tr onclick="location='readmail.html?mid=T0wM6P'" style="font-weight: bold;">
                <td>someone@outlook.com</td>
                <td>
                    <a href="readmail.html?mid=T0wM6P">sa</a>
                </td>
                <td>
                    <span title="2016-02-16 13:24:04">just now</span>
                </td>
            </tr>
        </table>
    </div>
</div>

我需要解析<tr onclick=标签中的链接和<td>标签中的电子邮件地址.

I need to parse links in <tr onclick= tags and email addresses in <td> tags.

到目前为止,我设法从我的html中首次收到电子邮件/链接.

So far i manged to get first occurance of email/link from my html.

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(responseFromServer);

有人可以告诉我如何正确完成吗?基本上,我想做的是获取上述标记中的所有电子邮件地址和html链接.

Could someone show me how is it properly done? Basically what i want to do is take all email addresses and links from html that are in said tags.

foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//tr[@onclick]"))
{
    HtmlAttribute att = link.Attributes["onclick"];
    Console.WriteLine(att.Value);
}

我需要成对存储一个类(列表)中的解析值.电子邮件(链接)和发件人电子邮件.

I need to store parsed values in a class (list) in pairs. Email (link) and senders Email.

public class ClassMailBox
{
    public string From { get; set; } 
    public string LinkToMail { get; set; }    

}

推荐答案

您可以编写以下代码:

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(responseFromServer);

foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//tr[@onclick]"))
{
    HtmlAttribute att = link.Attributes["onclick"];
    ClassMailBox classMailbox = new ClassMailBox() { LinkToMail = att.Value };
    classMailBoxes.Add(classMailbox);
}

int currentPosition = 0;

foreach (HtmlNode tableDef in doc.DocumentNode.SelectNodes("//tr[@onclick]/td[1]"))
{
    classMailBoxes[currentPosition].From = tableDef.InnerText;
    currentPosition++;
}

为了使这段代码简单,我假设一些事情:

To keep this code simple, I'm assuming some things:

  1. 电子邮件始终位于tr内的第一个td上,其中包含一个onlink属性
  2. 每个具有onlink属性的tr都包含一封电子邮件

如果这些条件不适用,则此代码将无法正常工作,并且可能引发某些异常(IndexOutOfRangeExceptions)或与错误电子邮件地址的链接进行匹配.

If those conditions don't apply this code won't work and it could throw some exceptions (IndexOutOfRangeExceptions) or it could match links with wrong email addresses.

这篇关于使用敏捷包解析HTML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆