使用解析HTML文档HtmlAgilityPack [英] Parse html document using HtmlAgilityPack

查看:206
本文介绍了使用解析HTML文档HtmlAgilityPack的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图解析以下HTML通过Htm​​lAgilityPack片段:

I'm trying to parse the following html snippet via HtmlAgilityPack:

<td bgcolor="silver" width="50%" valign="top">
 <table bgcolor="silver" style="font-size: 90%" border="0" cellpadding="2" cellspacing="0"
                                                width="100%">
   <tr bgcolor="#003366">
       <td>
           <font color="white">Info
        </td>
        <td>
           <font color="white">
              <center>Price
                   </td>
                      <td align="right">
                         <font color="white">Hourly
                         </td>
              </tr>
               <tr>
                 <td>
                     <a href='test1.cgi?type=1'>Bookbags</a>
                 </td>
                   <td>
                      $156.42
                    </td>
                    <td align="right">
                        <font color="green">0.11%</font>
                      </td>
                  </tr>
                  <tr>
                    <td>
                       <a href='test2.cgi?type=2'>Jeans</a>
                     </td>
                         <td>
                            $235.92
                               </td>
                                  <td align="right">
                                     <font color="red">100%</font>
                                  </td>
                   </tr>
               </table>
          </td>



我的代码看起来是这样的:

My code looks something like this:

private void ParseHtml(HtmlDocument htmlDoc)
{
    var ItemsAndPrices = new Dictionary<string, int>();
   var findItemPrices = from links in htmlDoc.DocumentNode.Descendants()
                             where links.Name.Equals("table") && 
                             links.Attributes["width"].Equals ("100%") && 
                             links.Attributes["bgcolor"].Equals("silver")
                            select new
                                       {
                                           //select item and price
                                       }

在这种情况下,我想为s 选出这是牛仔裤和书包以及它们相关的价格下面,并将它们存储在一个字典中。

In this instance, I would like to select the item which are Jeans and Bookbags as well as their associated prices below and store them in a dictionary.

E.g Jeans at price $235.92

有谁知道如何通过htmlagility包正确做到这一点和LINQ?

Does anyone know how to do this properly via htmlagility pack and LINQ?

推荐答案

假设有可能是其他行,而你没有特别想要书包不仅和牛仔裤,我会像这样做:

Assuming that there could be other rows and you don't specifically want only Bookbags and Jeans, I'd do it like this:

var table = htmlDoc.DocumentNode
    .SelectSingleNode("//table[@bgcolor='silver' and @width='100%']");
var query =
    from row in table.Elements("tr").Skip(1) // skip the header row
    let columns = row.Elements("td").Take(2) // take only the first two columns
        .Select(col => col.InnerText.Trim())
        .ToList()
    select new
    {
        Info = columns[0],
        Price = Decimal.Parse(columns[1], NumberStyles.Currency),
    };

这篇关于使用解析HTML文档HtmlAgilityPack的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆