用VB DOTNET解析HTML [英] Parsing HTML with VB DOTNET

查看:77
本文介绍了用VB DOTNET解析HTML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图解析网站中的一些数据以从其表格中获取特定项目。我知道,bgcolor属性设置为#ffffff或#f4f4ff的任何标记都是我想要开始的地方,而我的实际数据位于第二个地方。



目前,我有:

  Private Sub runForm()


Dim theElementCollection As HtmlElementCollection = WebBrowser1。 Document.GetElementsByTagName(TR)
For Each curElement As HtmlElement in theElementCollection
Dim controlValue As String = curElement.GetAttribute(bgcolor)。ToString
MsgBox(controlValue)
如果controlValue.Equals(#f4f4ff)或controlValue.Equals(#ffffff)Then

End If
Next
End Sub

这段代码获取我需要的TR元素,但我不知道如何(如果有可能)调查内部元素。如果不是,你认为最好的路线是什么?该网站并没有真正标记他们的任何表格。 '我正在寻找基本上看起来像:

 < td>< b>< font size =2 >< a href =/ movie /?id = movieTitle.htm>电影< / a>< / font>< / b>< / td> 

我想拉出The Movie文字并将其添加到文本文件中。

解决方案使用 HtmlElement 中的 InnerHtml code> object( curElement ),就像这样:

  For Each curElement As HtmlElement in theElementCollection 
Dim controlValue As String = curElement.GetAttribute(bgcolor)。ToString
MsgBox(controlValue)
如果controlValue.Equals(#f4f4ff)或controlValue.Equals(#ffffff)然后
Dim elementValue As String = curElement.InnerHtml
End If
Next

阅读更多信息:

<要获取< tr> HTML元素的第二个子元素,请使用combina FirstChild 然后 NextSibling ,如下所示:



<$
Dim controlValue As String = curElement.GetAttribute(bgcolor)。ToString
MsgBox(controlValue)
如果controlValue .Equals(#f4f4ff)或controlValue.Equals(#ffffff)然后
Dim firstChildElement = curElement.FirstChild
Dim secondChildElement = firstChildElement.NextSibling

'secondChildElement should成为第二个< td>,现在获得内部HTML
的值Dim elementValue As String = secondChildElement.InnerHtml
End If
Next


I am trying to parse some data from a website to get specific items from their tables. I know that any tag with the bgcolor attribute set to #ffffff or #f4f4ff is where I want to start and my actual data sits in the 2nd within that .

Currently I have:

Private Sub runForm()


    Dim theElementCollection As HtmlElementCollection = WebBrowser1.Document.GetElementsByTagName("TR")
    For Each curElement As HtmlElement In theElementCollection
        Dim controlValue As String = curElement.GetAttribute("bgcolor").ToString
        MsgBox(controlValue)
        If controlValue.Equals("#f4f4ff") Or controlValue.Equals("#ffffff") Then

        End If
    Next
End Sub

This code gets the TR element that I need, but I have no idea how (if it is possible) to then investigate the inner elements. If not, what do you think would be the best route to take? The site does not really label any of their tables. The 's i am looking for basically look like:

<td><b><font size="2"><a href="/movie/?id=movieTitle.htm">The Movie</a></font></b></td>

I want to pull out "The Movie" text and add it to a text file.

解决方案

Use the InnerHtml property of the HtmlElement object (curElement) you have, like this:

For Each curElement As HtmlElement In theElementCollection
    Dim controlValue As String = curElement.GetAttribute("bgcolor").ToString
    MsgBox(controlValue)
    If controlValue.Equals("#f4f4ff") Or controlValue.Equals("#ffffff") Then
        Dim elementValue As String = curElement.InnerHtml
    End If
Next

Read the documentation of HtmlElement.InnerHtml Property for more information.

UPDATE:

To get the second child of the <tr> HTML element, use a combination of FirstChild and then NextSibling, like this:

For Each curElement As HtmlElement In theElementCollection
    Dim controlValue As String = curElement.GetAttribute("bgcolor").ToString
    MsgBox(controlValue)
    If controlValue.Equals("#f4f4ff") Or controlValue.Equals("#ffffff") Then
        Dim firstChildElement = curElement.FirstChild
        Dim secondChildElement = firstChildElement.NextSibling

        ' secondChildElement should be the second <td>, now get the value of the inner HTML
        Dim elementValue As String = secondChildElement.InnerHtml
    End If
Next

这篇关于用VB DOTNET解析HTML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆