如何在Frames/IFrames中获取HtmlElement值? [英] How to get an HtmlElement value inside Frames/IFrames?

查看：223 发布时间：2020/8/18 20:43:18 c# .net winforms webbrowser-control

本文介绍了如何在Frames/IFrames中获取HtmlElement值?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用Winforms WebBrowser控件从下面链接的站点收集视频剪辑的链接.

I'm using the Winforms WebBrowser control to collect the links of video clips from the site linked below.

但是，当我逐个滚动元素时，找不到<video>标签.

But, when I scroll element by element, I cannot find the <video> tag.

void webBrowser_DocumentCompleted_2(object sender, WebBrowserDocumentCompletedEventArgs e)
{
    try
    {
        HtmlElementCollection pTags = browser.Document.GetElementsByTagName("video");
        int i = 1;
        foreach (HtmlElement link in links)
        {

            if (link.Children[0].GetAttribute("className") == "vjs-poster")
            {
                try
                {

                    i++;
                }
                catch (Exception ex)
                {
                    MessageBox.Show(ex.Message);
                }
            }
        }
    }   // Added by edit
}

使用后很快

HtmlElementCollection pTags = browser.Document.GetElementsByTagName("video");

我已经返回0

我需要调用任何ajax吗?

Do I need to call any ajax?

The Web page you linked contains IFrames.
An IFrame contains its own HtmlDocument. As of now, you're parsing just the main Document container.
Thus, you need to parse the HtmlElements TAGs of some other Frame.
The Web Page Frames list is referenced by the WebBrowser.Document.Window.Frames property, which returns an HtmlWindowCollection.
Each HtmlWindow in the collection contains it own HtmlDocument object.

大多数情况下，我们需要解析Frames集合中的每个HtmlWindow.Document，而不是解析WebBrowser返回的Document对象属性.除非，当然，除非我们已经知道必需的元素是主文档或另一个已知的Frame.

Instead of parsing the Document object property returned by a WebBrowser, we, most of the time, need to parse each HtmlWindow.Document in the Frames collection; unless, of course we already know that the required Elements are part of the main Document or another known Frame.

一个示例(与当前任务有关):

An example (related to the current task):

Subscribe the DocumentCompleted event of the WebBrowser Control/Class.
Check the WebBrowser.ReadyState property to verify that a Document is loaded completly.

注意:
记住网页可能由Frames/IFrames中包含的多个Document组成，如果使用ReadyState = WebBrowserReadyState.Complete多次引发该事件，我们不会感到惊讶. 当WebBrowser加载完毕后，每个Frame的Document都会引发该事件.

Note:
Remembering that a Web Page may be composed by multiple Documents contained in Frames/IFrames, we won't be surprised if the event is raised multiple times with a ReadyState = WebBrowserReadyState.Complete.
Each Frame's Document will raise the event when the WebBrowser is done loading it.

使用
注意:
由于DocumentCompleted事件被多次引发，因此我们需要验证HtmlElement属性值也没有被多次存储.
在这里，我使用的是一个支持自定义类，该类包含所有收集的值以及每个引用Link的HashCode(在此依赖于GetHasCode()的默认实现).
每次解析文档时，我们都会比较其哈希值，以检查是否已存储值.

Note:
Since the DocumentCompleted event is raised multiple times, we need to verify that an HtmlElement Attribute value is not stored multiple times, too.
Here, I'm using a support custom Class that holds all the collected values along with the HashCode of each reference Link (here, relying on the default implementation of GetHasCode()).
Each time a Document is parsed, we check whether a value has already been stored, comparing its Hash.
- 当我们确认已找到重复的哈希时，请停止分析:框架文档元素已被提取.
注意:
解析HtmlWindowCollection时，不可避免地会引发一些特定的异常:
1) UnauthorizedAccessException :某些框架无法访问.
2) InvalidOperationException :某些元素/后代无法访问.

Note:
While parsing the HtmlWindowCollection, it's inevitable to raise some specific Exceptions:
1) UnauthorizedAccessException: some Frames cannot be accessed.
2) InvalidOperationException: some Elements/Descendants cannot be accessed.

我们没有什么可以避免的:元素不是null，当我们尝试访问它们的属性的任何(基类的错误设计)时，它们只是抛出这些异常.
在这里，我只是捕捉并忽略了这些特定的异常:我们知道我们最终将获得它们，我们无法避免，继续前进.

There's nothing we can do to avoid this: the Elements are not null, they simply throw these exceptions when we try to access any of their properties (bad design of the base class).
Here, I'm just catching and ignoring these specific Exceptions: we know we will eventually get them, we cannot avoid it, move on.
```
public class MovieLink
{
    public MovieLink() { }
    public int Hash { get; set; }
    public string VideoLink { get; set; }
    public string ImageLink { get; set; }
}

List<MovieLink> moviesLinks = new List<MovieLink>();

private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
    if (webBrowser1.ReadyState != WebBrowserReadyState.Complete) return;

    var documentFrames = webBrowser1.Document.Window.Frames;
    foreach (HtmlWindow Frame in documentFrames)
    {
        try
        {
            var videoElement = 
                Frame.Document.Body
                     .GetElementsByTagName("VIDEO").OfType<HtmlElement>().FirstOrDefault();

            if (videoElement != null)
            {
                string videoLink = videoElement.Children[0].GetAttribute("src");
                int hash = videoLink.GetHashCode();
                if (moviesLinks.Any(m => m.Hash == hash))
                {
                    // Done parsing this URL: remove handler or whatever 
                    // else is planned to move to the next site/page
                    return;
                }

                string sourceImage = videoElement.GetAttribute("poster");
                moviesLinks.Add(new MovieLink() {
                    Hash = hash, VideoLink = videoLink, ImageLink = sourceImage
                });
            }
        }
        catch (UnauthorizedAccessException) { } // Cannot be avoided: ignore
        catch (InvalidOperationException) { }   // Cannot be avoided: ignore
    }
}
```
这篇关于如何在Frames/IFrames中获取HtmlElement值?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在Frames/IFrames中获取HtmlElement值? [英] How to get an HtmlElement value inside Frames/IFrames?

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

如何在Frames/IFrames中获取HtmlElement值? [英] How to get an HtmlElement value inside Frames/IFrames?

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

登录关闭