如何在 Frames/IFrames 中获取 HtmlElement 值? [英] How to get an HtmlElement value inside Frames/IFrames?

查看:35
本文介绍了如何在 Frames/IFrames 中获取 HtmlElement 值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 Winforms WebBrowser 控件从下面链接的站点收集视频剪辑的链接.

I'm using the Winforms WebBrowser control to collect the links of video clips from the site linked below.

LINK

但是,当我逐个元素滚动时,我找不到 标签.

But, when I scroll element by element, I cannot find the <video> tag.

void webBrowser_DocumentCompleted_2(object sender, WebBrowserDocumentCompletedEventArgs e)
{
    try
    {
        HtmlElementCollection pTags = browser.Document.GetElementsByTagName("video");
        int i = 1;
        foreach (HtmlElement link in links)
        {

            if (link.Children[0].GetAttribute("className") == "vjs-poster")
            {
                try
                {

                    i++;
                }
                catch (Exception ex)
                {
                    MessageBox.Show(ex.Message);
                }
            }
        }
    }   // Added by edit
}

使用后不久

HtmlElementCollection pTags = browser.Document.GetElementsByTagName("video");

我已经返回0

我需要调用任何ajax吗?

Do I need to call any ajax?

推荐答案

您链接的网页包含 IFrames.
IFrame 包含它自己的 HTML 文档.到目前为止,您只解析主 Document 容器.
因此,您需要解析一些其他FrameHtmlElements 标签.
网页框架列表由 WebBrowser 引用.Document.Window.Frames 属性,返回一个 HtmlWindowCollection.
集合中的每个 HtmlWindow 都包含它自己的HtmlDocument 对象.

The Web page you linked contains IFrames.
An IFrame contains its own HtmlDocument. As of now, you're parsing just the main Document container.
Thus, you need to parse the HtmlElements TAGs of some other Frame.
The Web Page Frames list is referenced by the WebBrowser.Document.Window.Frames property, which returns an HtmlWindowCollection.
Each HtmlWindow in the collection contains it own HtmlDocument object.

大多数情况下,我们不需要解析由 WebBrowser 返回的 Document 对象属性,而是需要解析每个 HtmlWindow.DocumentFrames 集合中;当然,除非我们已经知道所需的元素是主文档或另一个已知的Frame 的一部分.

Instead of parsing the Document object property returned by a WebBrowser, we, most of the time, need to parse each HtmlWindow.Document in the Frames collection; unless, of course we already know that the required Elements are part of the main Document or another known Frame.

一个例子(与当前任务相关):

An example (related to the current task):

  • Subscribe the DocumentCompleted event of the WebBrowser Control/Class.
  • Check the WebBrowser.ReadyState property to verify that a Document is loaded completely.

注意:
请记住,一个网页可能由包含在 Frames/IFrames 中的多个文档组成,如果使用 ReadyState = WebBrowserReadyState.Complete 多次引发该事件,我们不会感到惊讶.
WebBrowser 完成加载时,每个 Frame 的 Document 将引发该事件.

Note:
Remembering that a Web Page may be composed by multiple Documents contained in Frames/IFrames, we won't be surprised if the event is raised multiple times with a ReadyState = WebBrowserReadyState.Complete.
Each Frame's Document will raise the event when the WebBrowser is done loading it.

  • Parse the HtmlDocument of each Frame in the Document.Window.Frames collection, using the Frame.Document.Body.GetElementsByTagName() method.
  • Extract the HtmlElements Attibute using the HtmlElement.GetAttribute method.

注意:
由于多次引发 DocumentCompleted 事件,我们需要验证 HtmlElement 属性值是否也没有多次存储.
在这里,我使用了一个支持自定义类,该类包含所有收集的值以及每个引用链接的 HashCode(此处依赖于 GetHasCode() 的默认实现).
每次解析 Document 时,我们都会检查一个值是否已经存储,并比较其 Hash.

Note:
Since the DocumentCompleted event is raised multiple times, we need to verify that an HtmlElement Attribute value is not stored multiple times, too.
Here, I'm using a support custom Class that holds all the collected values along with the HashCode of each reference Link (here, relying on the default implementation of GetHasCode()).
Each time a Document is parsed, we check whether a value has already been stored, comparing its Hash.

  • 当我们验证找到重复的哈希值时停止解析:框架文档元素已经被提取.

注意:
在解析 HtmlWindowCollection 时,不可避免地会引发一些特定的异常:

Note:
While parsing the HtmlWindowCollection, it's inevitable to raise some specific Exceptions:

  1. UnauthorizedAccessException:无法访问某些框架.
  2. InvalidOperationException:无法访问某些元素/后代.
  1. UnauthorizedAccessException: some Frames cannot be accessed.
  2. InvalidOperationException: some Elements/Descendants cannot be accessed.

我们无法避免这种情况:元素不是 null,它们只是在我们尝试访问它们的任何属性时抛出这些异常.
在这里,我只是捕捉并忽略这些特定的异常:我们知道我们最终会得到它们,我们无法避免它,继续前进.

There's nothing we can do to avoid this: the Elements are not null, they simply throw these exceptions when we try to access any of their properties.
Here, I'm just catching and ignoring these specific Exceptions: we know we will eventually get them, we cannot avoid it, move on.

public class MovieLink
{
    public MovieLink() { }
    public int Hash { get; set; }
    public string VideoLink { get; set; }
    public string ImageLink { get; set; }
}

List<MovieLink> moviesLinks = new List<MovieLink>();

private void Browser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
    var browser = sender as WebBrowser;
    if (browser.ReadyState != WebBrowserReadyState.Complete) return;

    var documentFrames = browser.Document.Window.Frames;
    foreach (HtmlWindow Frame in documentFrames) {
        try {
            var videoElement = Frame.Document.Body
                .GetElementsByTagName("VIDEO").OfType<HtmlElement>().FirstOrDefault();

            if (videoElement != null) {
                string videoLink = videoElement.Children[0].GetAttribute("src");
                int hash = videoLink.GetHashCode();
                if (moviesLinks.Any(m => m.Hash == hash)) {
                    // Done parsing this URL: remove handler or whatever 
                    // else is planned to move to the next site/page
                    return;
                }

                string sourceImage = videoElement.GetAttribute("poster");
                moviesLinks.Add(new MovieLink() {
                    Hash = hash, VideoLink = videoLink, ImageLink = sourceImage
                });
            }
        }
        catch (UnauthorizedAccessException) { } // Cannot be avoided: ignore
        catch (InvalidOperationException) { }   // Cannot be avoided: ignore
    }
}

这篇关于如何在 Frames/IFrames 中获取 HtmlElement 值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆