C# - 使用htmlagilitypack将HTML表数据移动到绑定列表 [英] C# - using htmlagilitypack to move HTML table data to a binding list

查看:54
本文介绍了C# - 使用htmlagilitypack将HTML表数据移动到绑定列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从一个表格中组织的网站中提取数据。前两行看起来像这样(我删除了一些样式信息):



I am pulling data from a website where it is organized in a table. The first two rows look like this (I deleted some style info):

<table id="loads">
   <thead>
   <tr class="tableHeading">
     <th><a original='Load ID'></a></th>
     <th><a original='# of cars'></a></th>
     <th><a original='Year/Make/Model'></a></th>
     <th><a original='Origin City'></a></th>
     <th><a original='Origin State'></a></th>
     <th><a original='Destination City'></a></th>
     <th><a original='Destination State'></a></th>
     <th><a original='Mileage'></a></th>
     <th><a original='Price per Shipment'></a></th>
     <th><a original='Price per Mile'></a></th>
     <th>View</th>
     <th><a original='Comments'></a></th>
   </tr>
   </thead>

   <tbody>
   <tr>
     <td>123456789</td>
     <td>1</td>
     <td>2015 GMC TERRAIN SLE</td>
     <td>Los Angeles</td>
     <td>CA</td>
     <td>San Francisco</td>
     <td>CA</td>
     <td>400</td>
     <td>$400</td>
     <td>$1</td>
     <td>
        <a href="/ViewLoad.asp?nload_id=123456789&npickup_code=">
         <img src="/images/icons/view.gif" >
         </a>
     </td>
     <td>Some Text</td>
   </tr>





每行有12个单元格 - 除了第11个以外的所有字符串,这是我的主要原因之一发表这个问题。



我的尝试:



我创造了一个有13个字符串属性的类。额外的一个(我做的第一个)是Status属性,它将是New或Old。稍后我会用New行做一些事情,但现在这不是我的问题。



所以现在我想抓住每个单元格的innertext(除了11)并将字符串分配给一个数组。以下是我的步骤:





There are 12 cells per row - all strings except for the 11th, which is one of the main reasons i am posting this question.

What I have tried:

I created a class that has 13 string properties. The extra one (which i made the first) is a Status property which will be New or Old. Later I am going to do some things with New rows, but that is not my issue right now.

So now i want to grab the innertext of each cell (except 11) and assign the string into an array. Here are my steps:

string collect = webBrowser1.Document.Body.InnerHtml;
string data = WebUtility.HtmlDecode(collect);
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.LoadHtml(data);
HtmlNodeCollection rows = htmlDoc.DocumentNode.SelectNodes("//table[@id='loads']//tbody//tr");





注意 - 我检查到这一点,到目前为止所有这些工作,并且行集合正在收集除了标题之外的表中的所有行(我上面只显示了一个非标题行,但有很多)。



下一步我迷路了。我试图将单元格字符串转换为字符串数组,并进入在表单级别设置的绑定列表:





Note - I checked up until this point, and so far all of this works, and the rows collection is collecting all of the rows in the table except the header (I only showed one non-header row above, but there are many).

On the next step I get lost. I am trying to get the cell strings into a string array, and into a bindinglist that is set up at the form level:

BindingSource source = new BindingSource(); /// this binds to the dataviewgrid
BindingList<Load> list = new BindingList<Load>();
BindingList<Load> listDeleted = new BindingList<Load>();
List<Load> sortList = new List<Load>();





这是我的代码:





Here is my code:

int rowIndex = 0;

foreach (HtmlNode row in rows)
{
    int columnIndex = 0;
    string[] rowData = new string[13];

    foreach (HtmlNode cell in row.ChildNodes)
    {
        if (columnIndex != 0 && columnIndex != 11)
        {
            rowData[columnIndex - 1] = cell.InnerText;
        }

        rowData[11] = cell.FirstChild.Attributes["href"].Value;

        MessageBox.Show(rowData[11]);
        columnIndex++;
     }

     Load newLoad = new Load(rowData);

     if (!list.Contains(newLoad) && !listDeleted.Contains(newLoad))
     {
         list.Add(newLoad);
         updated = true;
     }
     else
     {
         int itemIndex = list.IndexOf(newLoad);
         if (itemIndex > 0)
         {
             if (!list[itemIndex].Comments.Equals(newLoad.Comments))
                 {
                     list[itemIndex].Comments = newLoad.Comments;
                     list[itemIndex].Status = "MODIFIED";
                     updated = true;
                 }
          }
       }
       rowIndex++;
   }

}



我不确定我在最后一个代码块中做错了什么 - 并且非常感谢任何帮助。


I am not sure what i am doing wrong in this last code block - and greatly appreciate any help.

推荐答案

400< / td>
< td>
400</td> <td>


1< / td>
< td>
< a href =/ ViewLoad.asp?nload_id = 123456789& npickup_code =>
< img src =/ images / icons / view.gif>
< / a>
< / td>
< td>一些文字< / td>
< / tr>
1</td> <td> <a href="/ViewLoad.asp?nload_id=123456789&npickup_code="> <img src="/images/icons/view.gif" > </a> </td> <td>Some Text</td> </tr>





每行有12个单元格 - 除了第11个以外的所有字符串,这是我的主要原因之一发表这个问题。



我的尝试:



我创造了一个有13个字符串属性的类。额外的一个(我做的第一个)是Status属性,它将是New或Old。稍后我会用New行做一些事情,但现在这不是我的问题。



所以现在我想抓住每个单元格的innertext(除了11)并将字符串分配给一个数组。以下是我的步骤:





There are 12 cells per row - all strings except for the 11th, which is one of the main reasons i am posting this question.

What I have tried:

I created a class that has 13 string properties. The extra one (which i made the first) is a Status property which will be New or Old. Later I am going to do some things with New rows, but that is not my issue right now.

So now i want to grab the innertext of each cell (except 11) and assign the string into an array. Here are my steps:

string collect = webBrowser1.Document.Body.InnerHtml;
string data = WebUtility.HtmlDecode(collect);
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.LoadHtml(data);
HtmlNodeCollection rows = htmlDoc.DocumentNode.SelectNodes("//table[@id='loads']//tbody//tr");





注意 - 我检查到这一点,到目前为止所有这些工作,并且行集合正在收集除了标题之外的表中的所有行(我上面只显示了一个非标题行,但有很多)。



下一步我迷路了。我试图将单元格字符串转换为字符串数组,并进入在表单级别设置的绑定列表:





Note - I checked up until this point, and so far all of this works, and the rows collection is collecting all of the rows in the table except the header (I only showed one non-header row above, but there are many).

On the next step I get lost. I am trying to get the cell strings into a string array, and into a bindinglist that is set up at the form level:

BindingSource source = new BindingSource(); /// this binds to the dataviewgrid
BindingList<Load> list = new BindingList<Load>();
BindingList<Load> listDeleted = new BindingList<Load>();
List<Load> sortList = new List<Load>();





这是我的代码:





Here is my code:

int rowIndex = 0;

foreach (HtmlNode row in rows)
{
    int columnIndex = 0;
    string[] rowData = new string[13];

    foreach (HtmlNode cell in row.ChildNodes)
    {
        if (columnIndex != 0 && columnIndex != 11)
        {
            rowData[columnIndex - 1] = cell.InnerText;
        }

        rowData[11] = cell.FirstChild.Attributes["href"].Value;

        MessageBox.Show(rowData[11]);
        columnIndex++;
     }

     Load newLoad = new Load(rowData);

     if (!list.Contains(newLoad) && !listDeleted.Contains(newLoad))
     {
         list.Add(newLoad);
         updated = true;
     }
     else
     {
         int itemIndex = list.IndexOf(newLoad);
         if (itemIndex > 0)
         {
             if (!list[itemIndex].Comments.Equals(newLoad.Comments))
                 {
                     list[itemIndex].Comments = newLoad.Comments;
                     list[itemIndex].Status = "MODIFIED";
                     updated = true;
                 }
          }
       }
       rowIndex++;
   }

}



我不确定我在最后一个代码块中做错了什么 - 并且非常感谢任何帮助。


I am not sure what i am doing wrong in this last code block - and greatly appreciate any help.


事实证明该网站正在返回一些显示为额外行的转义字符,因此我能够通过重写我的条件来处理它。

感谢您花时间回答我的问题理查德,它有所帮助。
It turned out that the website was returning some escape characters that were showing up as additional rows, so I was able to handle that by rewriting my conditionals.
Thanks for taking the time to respond to my question Richard, it helped.


这篇关于C# - 使用htmlagilitypack将HTML表数据移动到绑定列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆