如何访问< iframe>身体使用c ++ / ATL / COM? [英] How to get access of <iframe> body using c++/ATL/COM?

查看:171
本文介绍了如何访问< iframe>身体使用c ++ / ATL / COM?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我写了一个浏览器帮助对象来获取标签之间的文本,并将其用于数据挖掘目的。我试图使用它的iGoogle (基本上以测试其上的小工具功能),它在某些情况下,失败的,其中一个< IFRAME> 与一些外部源一起存在。

I have written a browser helper object to get the text between the tags and use it for data mining purpose. I tried using it on igoogle (basically to test its capability on gadgets) and it failed in some of the cases where an <iframe> is present with some external source.

我可以获得< div> 及其子< iframe> ,但未能得到身体。

I can get the <div> and its child <iframe> but fail to get the body.

我从这个API获得框架集 HRESULT的IHTMLDocument2 :: get_frames(IHTMLFramesCollection2 * * p);

I get the frame collection from this API HRESULT IHTMLDocument2::get_frames(IHTMLFramesCollection2 **p);

问题可以在igoogle和firefox中使用贷款计算器重新创建小工具。你还需要fire bug扩展来调试页面。为了参考,我在这里粘贴样本...

The problem can be re-created in igoogle and firefox using the loan calculator gadget. You will also need the fire bug extension to debug the page. For reference purpose I am pasting the sample here...

<div class="modboxin" id="m_8_b"><div style="border: 0pt none; padding: 0pt; margin: 0pt; width: 100%;" id="remote_8">
<iframe scrolling="no" frameborder="0" onload="_ifr_ol(this)" style="border: 0pt none; padding: 0pt; margin: 0pt; width: 100%; height: 100px; overflow: hidden;" name="remote_iframe_8" id="remote_iframe_8" src="http://8.ig.gmodules.com/gadgets/ifr?exp_rpc_js=1&amp;exp_track_js=1&amp;v=682f3db70d7cfff515d7c64fd24923&amp;container=ig&amp;view=default&amp;debug=0&amp;mid=8&amp;lang=en&amp;url=http%3A%2F%2Fwww.nova.edu%2F%7Ewillheat%2Floan.xml&amp;country=US&amp;parent=http://www.google.com&amp;libs=core:core.io:core.iglegacy:auth-refresh&amp;synd=ig&amp;view=default#st=...B27zWVKsnJu6OviCNnzXoPjkDsbPg95yZNMwfmMaLnwWoRxGaRArxTpOqK4TiH87uGUiHnYkkaqU9NE1sOyms6sg/Jwi&amp;gadgetId=116809661812082345195&amp;gadgetOwner=105250506097979753968&amp;gadgetViewer=105250506097979753968&amp;rpctoken=422312139&amp;ifpctok=422312139">
</iframe>
</div>

链接不完整,因为我已经替换了 src ... 。现在你可以看到没有身体,虽然它正在浏览器中呈现。

The link is not complete as I have replaced some part of the src with .... Now as you can see that there is no body for the although it is getting rendered in the browser..

按照这篇文章( http: //stackoverflow.com/questions/957133/does-body-onload-wait-for-iframes )对身体onload事件不会等待帧完成。

As per this post ( http://stackoverflow.com/questions/957133/does-body-onload-wait-for-iframes ) the onload event on body does not wait for frames to complete.

所以我可以得出结论,我必须为< iframe> onload c> ...但我不知道如何...

So I can conclude that I have to use some sort onload listener for the <iframe>... but I am not sure how ...

请建议一种方式/代码段来检索< iframe> ; 使用ATL / COM API ...

Kindly suggest a way/snippet to retrieve the body of the <iframe> using ATL/COM APIs...

**更新**

我使用以下代码来获取< iframes> 。虽然我得到的iframe集合,但是当我尝试得到他们的身体它失败...可能是因为他们没有加载的那段时间?

I am using the following code to get the <iframes>. Although i get the iframe collection but when i try to get their body it fails... may be because they are not loaded by that time ?!

void testFrame(IHTMLDocument2* pDocument)
{
    CComQIPtr<IHTMLFramesCollection2> col;
    HRESULT hr = pDocument->get_frames(&col);
    if((hr == S_OK) && (col != NULL))
    {
        long counter = 0;
        hr = col->get_length(&counter);
        if((hr == S_OK) && (counter > 0))
        {
            for (int i = 0; i < counter; i++)
            {
                VARIANT     v1, v2;
                v1.vt = VT_I4; v1.lVal = i;
                VariantClear (&v2);
                hr = col->item(&v1, &v2);

                if (hr == S_OK && (v2.vt == VT_DISPATCH))
                {
                    CComPtr<IDispatch> pDispatch = v2.pdispVal;
                    CComQIPtr<IHTMLWindow2, &IID_IHTMLWindow2> pFrame = pDispatch;

                    if(pFrame)
                    {
                        CComPtr<IHTMLDocument2> spHTML;
                        hr = pFrame->get_document (&spHTML);

                        if((hr == S_OK) && (spHTML != NULL))
                        {
                            CComQIPtr<IHTMLElement> elem;
                            hr = spHTML->get_body(&elem);
                            if((hr == S_OK) && (elem != NULL))
                            {
                                CComBSTR str;
                                hr = elem->get_innerHTML(&str);
                                if((hr == S_OK) && (str != NULL))
                                {
                                    box(str);
                                }else if(hr != S_OK) {
                                    box(_T("hr is not ok"));
                                }else if(str == NULL){
                                    box(_T("STR is null"));
                                }else
                                    box(_T("Failed"));
                            }
                        }
                    }
                }
            }
        }
    }
}

并且,

void box(LPCWSTR msg)
{
    MessageBox(NULL,msg,_T("..BOX.."),MB_OK);
}

任何建议,如何获取iframe主体....我正在处理 OnDocumentComplete 事件...

Any suggestions, how to get the iframe body .... by the way I am handling this in OnDocumentComplete event...

感谢,

推荐答案

而不是更新我自己的问题。我把这作为一个答案。虽然我真的很想看到一个替代的答案...

Instead of updating my own question.. I am putting this as an answer. Though I would really love to see an alternate answer...

- 解决方案 -

我的基本假设是:


  1. 我知道要处理的网址..

  2. 一个页面可以分为两个主要事件(也可以有其他事件,但这两个事件也可以)

    • 主页的完成

    • 完成< iframes>

  1. I know about the urls to handle..
  2. A page can be divided in two main events (there could be other events too but these two will do)
    • The completion of the main page
    • Completion of the <iframes>

代码

void STDMETHODCALLTYPE CSafeMaskBHO::OnDocumentComplete(IDispatch *pDisp, VARIANT *pvarURL)
{
    CComQIPtr<IWebBrowser2> spTempWebBrowser = pDisp;

    CComBSTR url = NULL;
    HRESULT hr = spTempWebBrowser->get_LocationURL(&url); // You can also take the url from pvarURL .. 

    if((hr == S_OK) && (url != NULL))
    {
        /*
            I know which url's I am looking for
        */
        if(!(wcsstr(url,_T("www.example.com")) != NULL) && !((wcsstr(url,_T("www.test.com")) != NULL))){
            return;
        }       

        CComPtr<IDispatch> frameDocDisp;
        hr = spTempWebBrowser->get_Document(&frameDocDisp);
        if((hr == S_OK) && (frameDocDisp != NULL))
        {
            CComQIPtr<IHTMLDocument3> spHTMLDoc = frameDocDisp;
            // ... Do someting useful ...

        }

    }else if(spTempWebBrowser && m_spWebBrowser && m_spWebBrowser.IsEqualObject(spTempWebBrowser))
    {
        CComPtr<IDispatch> spDispDoc;
        hr = m_spWebBrowser->get_Document(&spDispDoc);

        if ((hr == S_OK) && (spDispDoc != NULL))
        {
            CComQIPtr<IHTMLDocument2> spHTMLDoc = spDispDoc;
            if(spHTMLDoc)
            {
                // ... Do someting useful ...
            }
        }
    }
}

如果你认为你有什么要分享(建议/更正/替代品) )

If you think that you have anything to share (suggestions/corrections/alternatives) then please do so.. :)

感谢,

这篇关于如何访问&lt; iframe&gt;身体使用c ++ / ATL / COM?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆