如何访问< iframe>身体使用c ++ / ATL / COM? [英] How to get access of <iframe> body using c++/ATL/COM?
问题描述
我写了一个浏览器帮助对象来获取标签之间的文本,并将其用于数据挖掘目的。我试图使用它的iGoogle
(基本上以测试其上的小工具功能),它在某些情况下,失败的,其中一个< IFRAME>
与一些外部源一起存在。
I have written a browser helper object to get the text between the tags and use it for data mining purpose. I tried using it on igoogle
(basically to test its capability on gadgets) and it failed in some of the cases where an <iframe>
is present with some external source.
我可以获得< div>
及其子< iframe>
,但未能得到身体。
I can get the <div>
and its child <iframe>
but fail to get the body.
我从这个API获得框架集 HRESULT的IHTMLDocument2 :: get_frames(IHTMLFramesCollection2 * * p);
I get the frame collection from this API HRESULT IHTMLDocument2::get_frames(IHTMLFramesCollection2 **p);
问题可以在igoogle和firefox中使用贷款计算器重新创建
小工具。你还需要fire bug扩展来调试页面。为了参考,我在这里粘贴样本...
The problem can be re-created in igoogle and firefox using the loan calculator
gadget. You will also need the fire bug extension to debug the page. For reference purpose I am pasting the sample here...
<div class="modboxin" id="m_8_b"><div style="border: 0pt none; padding: 0pt; margin: 0pt; width: 100%;" id="remote_8">
<iframe scrolling="no" frameborder="0" onload="_ifr_ol(this)" style="border: 0pt none; padding: 0pt; margin: 0pt; width: 100%; height: 100px; overflow: hidden;" name="remote_iframe_8" id="remote_iframe_8" src="http://8.ig.gmodules.com/gadgets/ifr?exp_rpc_js=1&exp_track_js=1&v=682f3db70d7cfff515d7c64fd24923&container=ig&view=default&debug=0&mid=8&lang=en&url=http%3A%2F%2Fwww.nova.edu%2F%7Ewillheat%2Floan.xml&country=US&parent=http://www.google.com&libs=core:core.io:core.iglegacy:auth-refresh&synd=ig&view=default#st=...B27zWVKsnJu6OviCNnzXoPjkDsbPg95yZNMwfmMaLnwWoRxGaRArxTpOqK4TiH87uGUiHnYkkaqU9NE1sOyms6sg/Jwi&gadgetId=116809661812082345195&gadgetOwner=105250506097979753968&gadgetViewer=105250506097979753968&rpctoken=422312139&ifpctok=422312139">
</iframe>
</div>
链接不完整,因为我已经替换了 src
与 ...
。现在你可以看到没有身体,虽然它正在浏览器中呈现。
The link is not complete as I have replaced some part of the src
with ...
. Now as you can see that there is no body for the although it is getting rendered in the browser..
按照这篇文章( http: //stackoverflow.com/questions/957133/does-body-onload-wait-for-iframes
)对身体onload事件不会等待帧完成。
As per this post ( http://stackoverflow.com/questions/957133/does-body-onload-wait-for-iframes
) the onload event on body does not wait for frames to complete.
所以我可以得出结论,我必须为< iframe> $ c $使用一些
onload
c> ...但我不知道如何...
So I can conclude that I have to use some sort onload
listener for the <iframe>
... but I am not sure how ...
请建议一种方式/代码段来检索< iframe> ;
使用ATL / COM API ...
Kindly suggest a way/snippet to retrieve the body of the <iframe>
using ATL/COM APIs...
**更新**
我使用以下代码来获取< iframes>
。虽然我得到的iframe集合,但是当我尝试得到他们的身体它失败...可能是因为他们没有加载的那段时间?
I am using the following code to get the <iframes>
. Although i get the iframe collection but when i try to get their body it fails... may be because they are not loaded by that time ?!
void testFrame(IHTMLDocument2* pDocument)
{
CComQIPtr<IHTMLFramesCollection2> col;
HRESULT hr = pDocument->get_frames(&col);
if((hr == S_OK) && (col != NULL))
{
long counter = 0;
hr = col->get_length(&counter);
if((hr == S_OK) && (counter > 0))
{
for (int i = 0; i < counter; i++)
{
VARIANT v1, v2;
v1.vt = VT_I4; v1.lVal = i;
VariantClear (&v2);
hr = col->item(&v1, &v2);
if (hr == S_OK && (v2.vt == VT_DISPATCH))
{
CComPtr<IDispatch> pDispatch = v2.pdispVal;
CComQIPtr<IHTMLWindow2, &IID_IHTMLWindow2> pFrame = pDispatch;
if(pFrame)
{
CComPtr<IHTMLDocument2> spHTML;
hr = pFrame->get_document (&spHTML);
if((hr == S_OK) && (spHTML != NULL))
{
CComQIPtr<IHTMLElement> elem;
hr = spHTML->get_body(&elem);
if((hr == S_OK) && (elem != NULL))
{
CComBSTR str;
hr = elem->get_innerHTML(&str);
if((hr == S_OK) && (str != NULL))
{
box(str);
}else if(hr != S_OK) {
box(_T("hr is not ok"));
}else if(str == NULL){
box(_T("STR is null"));
}else
box(_T("Failed"));
}
}
}
}
}
}
}
}
并且,
void box(LPCWSTR msg)
{
MessageBox(NULL,msg,_T("..BOX.."),MB_OK);
}
任何建议,如何获取iframe主体....我正在处理 OnDocumentComplete
事件...
Any suggestions, how to get the iframe body .... by the way I am handling this in OnDocumentComplete
event...
感谢,
推荐答案
而不是更新我自己的问题。我把这作为一个答案。虽然我真的很想看到一个替代的答案...
Instead of updating my own question.. I am putting this as an answer. Though I would really love to see an alternate answer...
- 解决方案 -
我的基本假设是:
- 我知道要处理的网址..
- 一个页面可以分为两个主要事件(也可以有其他事件,但这两个事件也可以)
- 主页的完成
- 完成
< iframes>
- I know about the urls to handle..
- A page can be divided in two main events (there could be other events too but these two will do)
- The completion of the main page
- Completion of the
<iframes>
代码
void STDMETHODCALLTYPE CSafeMaskBHO::OnDocumentComplete(IDispatch *pDisp, VARIANT *pvarURL)
{
CComQIPtr<IWebBrowser2> spTempWebBrowser = pDisp;
CComBSTR url = NULL;
HRESULT hr = spTempWebBrowser->get_LocationURL(&url); // You can also take the url from pvarURL ..
if((hr == S_OK) && (url != NULL))
{
/*
I know which url's I am looking for
*/
if(!(wcsstr(url,_T("www.example.com")) != NULL) && !((wcsstr(url,_T("www.test.com")) != NULL))){
return;
}
CComPtr<IDispatch> frameDocDisp;
hr = spTempWebBrowser->get_Document(&frameDocDisp);
if((hr == S_OK) && (frameDocDisp != NULL))
{
CComQIPtr<IHTMLDocument3> spHTMLDoc = frameDocDisp;
// ... Do someting useful ...
}
}else if(spTempWebBrowser && m_spWebBrowser && m_spWebBrowser.IsEqualObject(spTempWebBrowser))
{
CComPtr<IDispatch> spDispDoc;
hr = m_spWebBrowser->get_Document(&spDispDoc);
if ((hr == S_OK) && (spDispDoc != NULL))
{
CComQIPtr<IHTMLDocument2> spHTMLDoc = spDispDoc;
if(spHTMLDoc)
{
// ... Do someting useful ...
}
}
}
}
如果你认为你有什么要分享(建议/更正/替代品) )
If you think that you have anything to share (suggestions/corrections/alternatives) then please do so.. :)
感谢,
这篇关于如何访问< iframe>身体使用c ++ / ATL / COM?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!