无法刮掉#document元素中的元素 [英] Can't scrape elements inside #document element
问题描述
对于我的某个Chrome扩展程序项目,我通过动态填充其src属性,在当前网页的< iframe>
标记中获取了另一个网页的HTML内容。现在,我想从< iframe>
标记中删除一些值。但jQuery始终将此< iframe>
标记显示为空。我使用的原因是在我开始抓取之前我想要执行的抓取页面中有一些JavaScript文件。我也尝试设置等待计时器,但jQuery总是显示< iframe>
标记为空(尽管设置了src属性)。
For one of my Chrome extension project, I fetched the HTML content of another webpage in an <iframe>
tag of the current webpage, by populating its src attribute dynamically. Now, I want to scrape a few values from inside the <iframe>
tag. But the jQuery always shows this <iframe>
tag as empty. The reason I am using is that there are a few JavaScript files inside the fetched page that I want to get executed before I start scraping. I also tried to set wait timers, but jQuery always shows <iframe>
tag to be empty (though the src attribute is set).
经过调查,我发现< iframe>
有一个奇怪的 #document
里面的值,后跟普通的HTML标签。我想知道这是否是jQuery无法通过< iframe>
标记内的DOM层次结构进行递归的原因。
Upon investigation, I found that the <iframe>
has a strange #document
value inside it, followed by the normal HTML tags. I wonder if this is the reason why the jQuery is unable to recurse through a DOM hierarchy inside the <iframe>
tag.
请参阅下面所需< iframe>
标记的检查视图的屏幕截图。
See below screenshot of the "inspect" view of the desired <iframe>
tag.
此外,<$ c $的主网页c>< iframe> 标记存在于与新获取页面网址相同的网站上(尽管是不同的子域名)。我在Chrome中没有收到任何访问权限警告,因此我不怀疑这是一个跨域问题。
Also, the main webpage on which the <iframe>
tag exists is on the same website as the newly fetch page url (albeit a different subdomain). And I'm not getting any access permission warnings in Chrome, so I do not suspect this to be a cross-domain issue.
编辑
即使在10秒后等待:
Even after 10 seconds wait:
console.log($("#insertHere").text());
返回空。并且,
console.log($("#insertHere").parent().html());
返回:< iframe id =insertHeresrc =/ courses / intro ...style =width:0; height:0; border:0; border:none;>< / iframe>
推荐答案
#document
是iFrame DOM的页面文档对象。
The #document
is a page document object for the iFrame DOM.
尝试访问iframe的文档
,例如
Try accessing the document
of the iframe, e.g.
var frame = document.getElementById('#hidden-frame');
console.log(frame.document.body);
您也可以尝试使用内容脚本并允许它在所有页面与< all_urls>
,应该加载iframe内容,并使用它将内容发送到后台脚本
使用消息传递。
You could also try using a Content Script and allowing it in all pages with <all_urls>
, which should be loaded with the iframe content, and use it to send the content to background script
using messaging.
这篇关于无法刮掉#document元素中的元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!