无法刮掉#document元素中的元素 [英] Can't scrape elements inside #document element

查看:114
本文介绍了无法刮掉#document元素中的元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于我的某个Chrome扩展程序项目,我通过动态填充其src属性,在当前网页的< iframe> 标记中获取了另一个网页的HTML内容。现在,我想从< iframe> 标记中删除一些值。但jQuery始终将此< iframe> 标记显示为空。我使用的原因是在我开始抓取之前我想要执行的抓取页面中有一些JavaScript文件。我也尝试设置等待计时器,但jQuery总是显示< iframe> 标记为空(尽管设置了src属性)。

For one of my Chrome extension project, I fetched the HTML content of another webpage in an <iframe> tag of the current webpage, by populating its src attribute dynamically. Now, I want to scrape a few values from inside the <iframe> tag. But the jQuery always shows this <iframe> tag as empty. The reason I am using is that there are a few JavaScript files inside the fetched page that I want to get executed before I start scraping. I also tried to set wait timers, but jQuery always shows <iframe> tag to be empty (though the src attribute is set).

经过调查,我发现< iframe> 有一个奇怪的 #document 里面的值,后跟普通的HTML标签。我想知道这是否是jQuery无法通过< iframe> 标记内的DOM层次结构进行递归的原因。

Upon investigation, I found that the <iframe> has a strange #document value inside it, followed by the normal HTML tags. I wonder if this is the reason why the jQuery is unable to recurse through a DOM hierarchy inside the <iframe> tag.

请参阅下面所需< iframe> 标记的检查视图的屏幕截图。

See below screenshot of the "inspect" view of the desired <iframe> tag.

此外,<$ c $的主网页c>< iframe> 标记存在于与新获取页面网址相同的网站上(尽管是不同的子域名)。我在Chrome中没有收到任何访问权限警告,因此我不怀疑这是一个跨域问题。

Also, the main webpage on which the <iframe> tag exists is on the same website as the newly fetch page url (albeit a different subdomain). And I'm not getting any access permission warnings in Chrome, so I do not suspect this to be a cross-domain issue.

编辑


即使在10秒后等待:


Even after 10 seconds wait:

console.log($("#insertHere").text());

返回空。并且,

console.log($("#insertHere").parent().html());

返回:< iframe id =insertHeresrc =/ courses / intro ...style =width:0; height:0; border:0; border:none;>< / iframe>

推荐答案

#document 是iFrame DOM的页面文档对象。

The #document is a page document object for the iFrame DOM.

尝试访问iframe的文档,例如

Try accessing the document of the iframe, e.g.

var frame = document.getElementById('#hidden-frame');
console.log(frame.document.body);

您也可以尝试使用内容脚本并允许它在所有页面< all_urls> ,应该加载iframe内容,并使用它将内容发送到后台脚本使用消息传递。

You could also try using a Content Script and allowing it in all pages with <all_urls>, which should be loaded with the iframe content, and use it to send the content to background script using messaging.

这篇关于无法刮掉#document元素中的元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆