如何在Frames/IFrames中获取HtmlElement值? [英] How to get an HtmlElement value inside Frames/IFrames?
问题描述
我正在使用Winforms
WebBrowser
控件从下面链接的站点收集视频剪辑的链接.
I'm using the Winforms
WebBrowser
control to collect the links of video clips from the site linked below.
But, when I scroll element by element, I cannot find the <video>
tag.
void webBrowser_DocumentCompleted_2(object sender, WebBrowserDocumentCompletedEventArgs e)
{
try
{
HtmlElementCollection pTags = browser.Document.GetElementsByTagName("video");
int i = 1;
foreach (HtmlElement link in links)
{
if (link.Children[0].GetAttribute("className") == "vjs-poster")
{
try
{
i++;
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
}
}
} // Added by edit
}
使用后很快
HtmlElementCollection pTags = browser.Document.GetElementsByTagName("video");
我已经返回0
我需要调用任何ajax吗?
Do I need to call any ajax?
推荐答案
您链接的网页包含 HtmlDocument 一个>.到目前为止,您只解析主Document容器.
因此,您需要解析其他一些Frame
的HtmlElements
标记.
WebBrowser引用了网页框架"列表. .Document.Window.Frames 属性,该属性返回 HtmlWindow 都包含它自己的HtmlDocument
对象.
The Web page you linked contains IFrames.
An IFrame
contains its own HtmlDocument. As of now, you're parsing just the main Document container.
Thus, you need to parse the HtmlElements
TAGs of some other Frame
.
The Web Page Frames list is referenced by the WebBrowser.Document.Window.Frames property, which returns an HtmlWindowCollection.
Each HtmlWindow in the collection contains it own HtmlDocument
object.
大多数情况下,我们需要解析Frames
集合中的每个HtmlWindow.Document
,而不是解析WebBrowser
返回的Document
对象属性.除非,当然,除非我们已经知道必需的元素是主文档或另一个已知的Frame
.
Instead of parsing the Document
object property returned by a WebBrowser
, we, most of the time, need to parse each HtmlWindow.Document
in the Frames
collection; unless, of course we already know that the required Elements are part of the main Document or another known Frame
.
一个示例(与当前任务有关):
An example (related to the current task):
- Subscribe the DocumentCompleted event of the WebBrowser Control/Class.
- Check the WebBrowser.ReadyState property to verify that a Document is loaded completly.
注意:
记住网页可能由Frames/IFrames中包含的多个Document组成,如果使用ReadyState = WebBrowserReadyState.Complete
多次引发该事件,我们不会感到惊讶.
当WebBrowser
加载完毕后,每个Frame的Document
都会引发该事件.
Note:
Remembering that a Web Page may be composed by multiple Documents contained in Frames/IFrames, we won't be surprised if the event is raised multiple times with a ReadyState = WebBrowserReadyState.Complete
.
Each Frame's Document
will raise the event when the WebBrowser
is done loading it.
- 使用 注意:
由于DocumentCompleted
事件被多次引发,因此我们需要验证HtmlElement
属性值也没有被多次存储.
在这里,我使用的是一个支持自定义类,该类包含所有收集的值以及每个引用Link的HashCode(在此依赖于GetHasCode()
的默认实现).
每次解析文档时,我们都会比较其哈希值,以检查是否已存储值.
Note:
Since theDocumentCompleted
event is raised multiple times, we need to verify that anHtmlElement
Attribute value is not stored multiple times, too.
Here, I'm using a support custom Class that holds all the collected values along with the HashCode of each reference Link (here, relying on the default implementation ofGetHasCode()
).
Each time a Document is parsed, we check whether a value has already been stored, comparing its Hash.- 当我们确认已找到重复的哈希时,请停止分析:框架文档元素已被提取.
注意:
解析HtmlWindowCollection
时,不可避免地会引发一些特定的异常:
1) UnauthorizedAccessException :某些框架无法访问.
2) InvalidOperationException :某些元素/后代无法访问.
Note:
While parsing theHtmlWindowCollection
, it's inevitable to raise some specific Exceptions:
1) UnauthorizedAccessException: some Frames cannot be accessed.
2) InvalidOperationException: some Elements/Descendants cannot be accessed.我们没有什么可以避免的:元素不是
null
,当我们尝试访问它们的属性的任何(基类的错误设计)时,它们只是抛出这些异常.
在这里,我只是捕捉并忽略了这些特定的异常:我们知道我们最终将获得它们,我们无法避免,继续前进.
There's nothing we can do to avoid this: the Elements are not
null
, they simply throw these exceptions when we try to access any of their properties (bad design of the base class).
Here, I'm just catching and ignoring these specific Exceptions: we know we will eventually get them, we cannot avoid it, move on.public class MovieLink { public MovieLink() { } public int Hash { get; set; } public string VideoLink { get; set; } public string ImageLink { get; set; } } List<MovieLink> moviesLinks = new List<MovieLink>(); private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e) { if (webBrowser1.ReadyState != WebBrowserReadyState.Complete) return; var documentFrames = webBrowser1.Document.Window.Frames; foreach (HtmlWindow Frame in documentFrames) { try { var videoElement = Frame.Document.Body .GetElementsByTagName("VIDEO").OfType<HtmlElement>().FirstOrDefault(); if (videoElement != null) { string videoLink = videoElement.Children[0].GetAttribute("src"); int hash = videoLink.GetHashCode(); if (moviesLinks.Any(m => m.Hash == hash)) { // Done parsing this URL: remove handler or whatever // else is planned to move to the next site/page return; } string sourceImage = videoElement.GetAttribute("poster"); moviesLinks.Add(new MovieLink() { Hash = hash, VideoLink = videoLink, ImageLink = sourceImage }); } } catch (UnauthorizedAccessException) { } // Cannot be avoided: ignore catch (InvalidOperationException) { } // Cannot be avoided: ignore } }
这篇关于如何在Frames/IFrames中获取HtmlElement值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
- 当我们确认已找到重复的哈希时,请停止分析:框架文档元素已被提取.