从HTML表中获取数据到数据表中 [英] Getting data from HTML table into a datatable

查看:154
本文介绍了从HTML表中获取数据到数据表中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

好吧,我需要查询一个实时网站以从表中获取数据,将此HTML表放入DataTable中,然后使用这些数据。到目前为止,我已经设法使用Html Agility Pack和XPath来访问我需要的表中的每一行,但我知道必须有一种方法将它解析为DataTable。 (C#)我目前使用的代码是:

  string htmlCode =; 
using(WebClient client = new WebClient())
{
htmlCode = client.DownloadString(http://www.website.com);
}
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();

doc.LoadHtml(htmlCode);

//我试图用LINQ来解决这个问题(不知道从哪里开始)
var myTable = doc.DocumentNode
.Descendants(table)
.Where(t => t.Attributes [summary]。Value ==Table One)
.FirstOrDefault();

//查找所有奇数行(这是我真正需要的,但是更喜欢
//包含所有行的DataTable!
foreach(doc中的HtmlNode单元格。 DocumentNode.SelectNodes(// tr [@ class ='odd'] / td))
{
string test = cell.InnerText;
//还没有进一步超过这个!
}

我查询的网站上的HTML表格如下所示: p>

 < table summary =Table One> 
< tbody>
< tr class = 奇数>
< td>一些文字< / td>
< td>一些值< / td>
< / tr>
< tr类=偶数>
< td>一些文本1< / td>
< td>一些值1< / td>
< / tr>
< tr class =odd>
< td>一些Text2< / td>
< td>一些Value2< / td>
< / tr>
< tr class =even>
< td>一些Text3< / td>
< td>一些Value3< / td>
< / tr>
< ; tr class =odd>
< td>一些Text4< / td>
< td>一些Value4< / td>
< / tr>
< / tbody>
< / table>

我不确定使用LINQ + HAP或XPath + HAP是更好还是更简单得到期望的结果,我尝试以有限的成功,你可能会看到。这是我第一次制作一个程序来查询一个网站,甚至以任何方式与一个网站进行交互,所以我目前很不确定!感谢您提前提供任何帮助:)

解决方案

HTML Agility Pack没有这种方法,但它不应该要创造一个并不难。这里有样本,它可以从Linq-到XML。如果需要,我可以帮助创建整个方法,但不是今天:)。



另见:


Ok so I need to query a live website to get data from a table, put this HTML table into a DataTable and then use this data. I have so far managed to use Html Agility Pack and XPath to get to each row in the table I need but I know there must be a way to parse it into a DataTable. (C#) The code I am currently using is:

string htmlCode = "";
using (WebClient client = new WebClient())
{
htmlCode = client.DownloadString("http://www.website.com");
}
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();

doc.LoadHtml(htmlCode);

//My attempt at LINQ to solve the issue (not sure where to go from here)
var myTable = doc.DocumentNode
.Descendants("table")
.Where(t =>t.Attributes["summary"].Value == "Table One")
.FirstOrDefault();

//Finds all the odd rows (which are the ones I actually need but would prefer a
//DataTable containing all the rows!
foreach (HtmlNode cell in doc.DocumentNode.SelectNodes("//tr[@class='odd']/td"))
{
string test = cell.InnerText;
//Have not gone further than this yet!
}

The HTML table on the website I am querying looks like this:

<table summary="Table One">
<tbody>
<tr class="odd">
<td>Some Text</td>
<td>Some Value</td>
</tr>
<tr class="even">
<td>Some Text1</td>
<td>Some Value1</td>
</tr>
<tr class="odd">
<td>Some Text2</td>
<td>Some Value2</td>
</tr>
<tr class="even">
<td>Some Text3</td>
<td>Some Value3</td>
</tr>
<tr class="odd">
<td>Some Text4</td>
<td>Some Value4</td>
</tr>
</tbody>
</table>

I'm not sure whether it is better/easier to use LINQ + HAP or XPath + HAP to get the desired result, I tried both with limited success as you can probably see. This is the first time I have ever made a program to query a website or even interact with a website in any way so I am very unsure at the moment! Thanks for any help in advance :)

解决方案

There's no such method out of the box from the HTML Agility Pack, but it shouldn't be too hard to create one. There's samples out there that do XML to Datatable from Linq-to-XML. These can be re-worked into what you need.

If needed I can help out creating the whole method, but not today :).

See also:

这篇关于从HTML表中获取数据到数据表中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆