用VB DOTNET解析HTML [英] Parsing HTML with VB DOTNET
问题描述
我试图解析网站中的一些数据以从其表格中获取特定项目。我知道,bgcolor属性设置为#ffffff或#f4f4ff的任何标记都是我想要开始的地方,而我的实际数据位于第二个地方。
目前,我有:
Private Sub runForm()
Dim theElementCollection As HtmlElementCollection = WebBrowser1。 Document.GetElementsByTagName(TR)
For Each curElement As HtmlElement in theElementCollection
Dim controlValue As String = curElement.GetAttribute(bgcolor)。ToString
MsgBox(controlValue)
如果controlValue.Equals(#f4f4ff)或controlValue.Equals(#ffffff)Then
End If
Next
End Sub
这段代码获取我需要的TR元素,但我不知道如何(如果有可能)调查内部元素。如果不是,你认为最好的路线是什么?该网站并没有真正标记他们的任何表格。 '我正在寻找基本上看起来像:
< td>< b>< font size =2 >< a href =/ movie /?id = movieTitle.htm>电影< / a>< / font>< / b>< / td>
我想拉出The Movie文字并将其添加到文本文件中。
HtmlElement 中的 InnerHtml
code> object( curElement
),就像这样:
For Each curElement As HtmlElement in theElementCollection
Dim controlValue As String = curElement.GetAttribute(bgcolor)。ToString
MsgBox(controlValue)
如果controlValue.Equals(#f4f4ff)或controlValue.Equals(#ffffff)然后
Dim elementValue As String = curElement.InnerHtml
End If
Next
阅读更多信息:
<要获取< tr>
HTML元素的第二个子元素,请使用combina FirstChild
然后 NextSibling
,如下所示:
<$
Dim controlValue As String = curElement.GetAttribute(bgcolor)。ToString
MsgBox(controlValue)
如果controlValue .Equals(#f4f4ff)或controlValue.Equals(#ffffff)然后
Dim firstChildElement = curElement.FirstChild
Dim secondChildElement = firstChildElement.NextSibling
'secondChildElement should成为第二个< td>,现在获得内部HTML
的值Dim elementValue As String = secondChildElement.InnerHtml
End If
Next
I am trying to parse some data from a website to get specific items from their tables. I know that any tag with the bgcolor attribute set to #ffffff or #f4f4ff is where I want to start and my actual data sits in the 2nd within that .
Currently I have:
Private Sub runForm()
Dim theElementCollection As HtmlElementCollection = WebBrowser1.Document.GetElementsByTagName("TR")
For Each curElement As HtmlElement In theElementCollection
Dim controlValue As String = curElement.GetAttribute("bgcolor").ToString
MsgBox(controlValue)
If controlValue.Equals("#f4f4ff") Or controlValue.Equals("#ffffff") Then
End If
Next
End Sub
This code gets the TR element that I need, but I have no idea how (if it is possible) to then investigate the inner elements. If not, what do you think would be the best route to take? The site does not really label any of their tables. The 's i am looking for basically look like:
<td><b><font size="2"><a href="/movie/?id=movieTitle.htm">The Movie</a></font></b></td>
I want to pull out "The Movie" text and add it to a text file.
Use the InnerHtml
property of the HtmlElement
object (curElement
) you have, like this:
For Each curElement As HtmlElement In theElementCollection
Dim controlValue As String = curElement.GetAttribute("bgcolor").ToString
MsgBox(controlValue)
If controlValue.Equals("#f4f4ff") Or controlValue.Equals("#ffffff") Then
Dim elementValue As String = curElement.InnerHtml
End If
Next
Read the documentation of HtmlElement.InnerHtml Property for more information.
UPDATE:
To get the second child of the <tr>
HTML element, use a combination of FirstChild
and then NextSibling
, like this:
For Each curElement As HtmlElement In theElementCollection
Dim controlValue As String = curElement.GetAttribute("bgcolor").ToString
MsgBox(controlValue)
If controlValue.Equals("#f4f4ff") Or controlValue.Equals("#ffffff") Then
Dim firstChildElement = curElement.FirstChild
Dim secondChildElement = firstChildElement.NextSibling
' secondChildElement should be the second <td>, now get the value of the inner HTML
Dim elementValue As String = secondChildElement.InnerHtml
End If
Next
这篇关于用VB DOTNET解析HTML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!