如何抓取Web表并在VB.NET应用程序中显示它? [英] How to scrape web table and display it in VB.NET application?

查看:100
本文介绍了如何抓取Web表并在VB.NET应用程序中显示它?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从网页中获取数据表并将其显示在vb.NET应用程序中。



在进行一些研究时我发现HTML Agility Pack包含的功能可以轻松地从网页上抓取/提取数据。



包装中没有任何文档,我似乎无法找到在互联网上正确的解决方案希望有人能把我推向正确的方向。



这是我要拉的桌子。



http://testapplications.net16.net/test.html [ ^ ]



这是我发现的浏览网络并尝试不同的方法:



 ' 加载html文档
Dim web As New HtmlWeb()
Dim doc As HtmlDocument = web.Load(http://testapplications.net16.net/test.html )

'
获取文档中的所有表
Dim tables As HtmlNodeCollection = doc.DocumentNode.SelectNodes(< span class =code-string>
/ table

' 迭代第一个表中的所有行
Dim rows As HtmlNodeCollection = tables(0).SelectNodes(./ tr)
For i As Integer = 0 To rows.Count - 1

'
行中迭代所有列
Dim cols As HtmlNodeCollection = rows(i).SelectNodes( ./ td
对于j As Integer = 0 至cols.Count - 1

< span class =code-string>'
获取列的值并打印它
Dim value As String = cols(j).InnerText
MessageBox.Show(value)
下一个
下一个





我收到NullReference错误在以下行:



 Dim rows As HtmlNodeCollection = tab les(0).SelectNodes(./ tr)





对象引用未设置为对象的实例。



我猜这意味着在桌面代码中找不到TR?



我尝试了什么:



浏览网页以查找我的问题的文档或解决方案,找不到任何内容文章或教程。

解决方案

这里有很多文章,其中一些在其他网站上...这里有一篇可以帮助你入门。下次,搜索更难。大多数示例使用C#,但很容易转换为VB.NET。



使用进度条进行网页抓取 [ ^ ]


试试这个:

< pre> ; Dim web As New HtmlWeb()
Dim doc As HtmlDocument = web.Load(https://www.w3schools.com/html/html_tables.asp)

'获取所有表格在文档
Dim tables As HtmlNodeCollection = doc.DocumentNode.SelectNodes(// table [@ id ='customers'])

'迭代第一个表中的所有行
Dim rows As HtmlNodeCollection = tables(0).SelectNodes(// tr)
For i As Integer = 0 To rows.Count - 1

'迭代此列中的所有列row
Dim cols As HtmlNodeCollection = rows(i).SelectNodes(// td)
for j As Integer = 0 To cols.Count - 1

'获取列的值并打印它
Dim value As String = cols(j).InnerText
MessageBox.Show(value)
Next
Next


I'm trying to grab a data table from a webpage and display it in a vb.NET application.

While doing some research I found out that the HTML Agility Pack contains features that will allow to easily scrape/extract data from a webpage.

The pack does not contain any documentation and I can't seem to find the right solution on the internet. Hope someone could push me in the right direction.

This is the table I'm trying to pull.

http://testapplications.net16.net/test.html[^]

This is what I found while browsing the net and trying different methods:

' Load the html document
        Dim web As New HtmlWeb()
        Dim doc As HtmlDocument = web.Load("http://testapplications.net16.net/test.html")

        ' Get all tables in the document
        Dim tables As HtmlNodeCollection = doc.DocumentNode.SelectNodes("/table")

        ' Iterate all rows in the first table
        Dim rows As HtmlNodeCollection = tables(0).SelectNodes("./tr")
        For i As Integer = 0 To rows.Count - 1

            ' Iterate all columns in this row
            Dim cols As HtmlNodeCollection = rows(i).SelectNodes("./td")
            For j As Integer = 0 To cols.Count - 1

                ' Get the value of the column and print it
                Dim value As String = cols(j).InnerText
                MessageBox.Show(value)
            Next
        Next



I get a NullReference error at the following line:

Dim rows As HtmlNodeCollection = tables(0).SelectNodes("./tr")



Object reference not set to an instance of an object.

I'm guessing this means that TR can't be found within the code of the table?

What I have tried:

Browsed the web for documentation or solutions for my problem, couldn't find any articles or tutorials.

解决方案

There are plenty of articles, some here, some on other sites...here is one to get you started. Next time, search harder. Most examples use C#, but it is easily converted to VB.NET.

Web scraping with progress bar[^]


Try this one:

<pre> Dim web As New HtmlWeb()
        Dim doc As HtmlDocument = web.Load("https://www.w3schools.com/html/html_tables.asp")

        ' Get all tables in the document
        Dim tables As HtmlNodeCollection = doc.DocumentNode.SelectNodes("//table[@id='customers']")

        ' Iterate all rows in the first table
        Dim rows As HtmlNodeCollection = tables(0).SelectNodes("//tr")
        For i As Integer = 0 To rows.Count - 1

            ' Iterate all columns in this row
            Dim cols As HtmlNodeCollection = rows(i).SelectNodes("//td")
            For j As Integer = 0 To cols.Count - 1

                ' Get the value of the column and print it
                Dim value As String = cols(j).InnerText
                MessageBox.Show(value)
            Next
        Next


这篇关于如何抓取Web表并在VB.NET应用程序中显示它?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆