使用HTML敏捷包(VB.net)从WebBrowser活动中抓取文本 [英] Scrape text from WebBrowser activity using HTML agility pack (VB.net)

查看：78 发布时间：2021/5/15 18:36:32 vb.net variables web-scraping html-agility-pack

本文介绍了使用HTML敏捷包(VB.net)从WebBrowser活动中抓取文本的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想使用HTML敏捷包提取Windows窗体的WebBrowser活动中的字段/文本.我可以在后台抓取文本，但希望在表单内部的WebBrowser中进行抓取.

I want to extract fields/text in a WebBrowser activity in Windows form using HTML agility pack. I'm able to scrape text in the background but want to do it in the WebBrowser inside my form.

我尝试将我的HtmlDocument变量引用到WebBrowser1.Document，但似乎无法将其转换.

I tried referencing my HtmlDocument variable to WebBrowser1.Document but it seems I cannot convert it.

这是我遇到的错误

这些是变量类型

这是我的代码.

Imports System
Imports System.Xml
Imports HtmlAgilityPack


Public Class Form1

    Private Sub Form1_load(sender As System.Object, e As EventArgs) Handles MyBase.Load

        WebBrowser1.Navigate(TextBox3.Text)

    End Sub

    Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click

        Dim link As String = TextBox3.Text
        Dim doc As HtmlDocument = New HtmlWeb().Load(link)
        Dim web_document As HtmlDocument = WebBrowser1.Document

        Dim name As HtmlNode = doc.DocumentNode.SelectSingleNode("//*[@id='details']/div[2]/div[2]/div/div[1]/h3")
        'if the div is found, print the inner text'
        If Not name Is Nothing Then
            TextBox1.Text = name.InnerText.Trim()

        End If


        Dim customer_number As HtmlNode = doc.DocumentNode.SelectSingleNode("//*[@id='details']/div[2]/div[2]/div/div[2]/dl[4]/dd")
        'if the div is found, print the inner text'
        If Not customer_number Is Nothing Then
            TextBox2.Text = customer_number.InnerText.Trim()

        End If

        MessageBox.Show("Doc variable: " + doc.GetType.ToString + Environment.NewLine + "web_document variable: " + web_document.GetType.ToString)

    End Sub

    Private Sub WebBrowser1_DocumentCompleted(sender As Object, e As WebBrowserDocumentCompletedEventArgs) Handles WebBrowser1.DocumentCompleted

    End Sub
End Class

The problem is WebBrowser1.Document returns a Windows.Forms.HtmlDocument, which is not the same as HtmlAgilityPack.HtmlDocument.

如果要使用HtmlAgilityPack从 WebBrowser 控件中的网页中抓取HTML，则需要获取

If you want to use HtmlAgilityPack to scrape HTML from a web page in a WebBrowser control, you need to get the DocumentText from the browser control and load it into a new HtmlAgilityPack.HtmlDocument instance like this:

Dim doc As New HtmlAgilityPack.HtmlDocument()
doc.LoadHtml(WebBrowser1.DocumentText)

这篇关于使用HTML敏捷包(VB.net)从WebBrowser活动中抓取文本的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用HTML敏捷包(VB.net)从WebBrowser活动中抓取文本 [英] Scrape text from WebBrowser activity using HTML agility pack (VB.net)

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用HTML敏捷包(VB.net)从WebBrowser活动中抓取文本 [英] Scrape text from WebBrowser activity using HTML agility pack (VB.net)

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭