如何使用 Phantomjs 向下滚动以加载动态内容 [英] How to scroll down with Phantomjs to load dynamic content
问题描述
我试图从一个页面中抓取链接,当用户向下滚动到底部(无限滚动)时,该页面会动态生成内容.我尝试用 Phantomjs 做不同的事情,但无法收集第一页以外的链接.假设加载内容的底部元素具有类 .has-more-items
.它在滚动时加载最终内容之前可用,然后在 DOM 中变得不可用(显示:无).这是我尝试过的东西-
- 在
var page = require('webpage').create();
之后立即将 viewportSize 设置为较大的高度
page.viewportSize = { 宽度:1600,高度:10000,};
- 在
page.open
中使用page.scrollPosition = { top: 10000, left: 0 }
但没有效果像-
page.open('http://example.com/?q=houston', function(status) {如果(状态==成功"){page.scrollPosition = { 顶部:10000,左侧:0 };}});
- 还尝试将它放在
page.evaluate
函数中,但这给出了
引用错误:找不到变量页面
- 尝试在
page.evaluate
和page.open
中使用 jQuery 和 JS 代码,但无济于事-
$("html, body").animate({ scrollTop: $(document).height() }, 10,功能() {//console.log('检查执行');});
原样,也在 document.ready
中.同样对于JS代码-
window.scrollBy(0,10000)
原样,也在 window.onload
我真的被它震惊了 2 天,但找不到方法.任何帮助或提示将不胜感激.
更新
我在 https://groups.google.com/forum/?fromgroups=#!topic/phantomjs/8LrWRW8ZrA0
var hitRockBottom = false;而(!hitRockBottom){//滚动页面(不确定这是否是最好的方法...)page.scrollPosition = { top: page.scrollPosition + 1000, left: 0 };//检查我们是否已经触底hitRockBottom = page.evaluate(function() {return document.querySelector(".has-more-items") === null;});}
其中 .has-more-items
是我想要访问的元素类,最初在页面底部可用,当我们向下滚动时,它会进一步向下移动,直到加载所有数据然后变得不可用.
但是,当我测试时很明显它在不向下滚动的情况下进入无限循环(我渲染图片以进行检查).我试图用下面的代码(一次一个)替换 page.scrollPosition = { top: page.scrollPosition + 1000, left: 0 };
window.document.body.scrollTop = '1000';location.href = ".has-more-items";page.scrollPosition = { top: page.scrollPosition + 1000, left: 0 };document.location.href=".has-more-items";
但似乎没有任何效果.
找到了解决方案并尝试适应您的情况.我没有测试找到页面底部的最佳方法,因为我有不同的上下文,但请检查下面的解决方案.这里的问题是你必须等待页面加载和 javascript 异步工作,所以你必须使用 setInterval
或 setTimeout
(see) 来实现这一点.
page.open('http://example.com/?q=houston', function () {//检查底部 div 并不时向下滚动window.setInterval(function() {//检查是否有 class=".has-more-items" 的 div//(不确定是否有更好的方法)var count = page.content.match(/class=.has-more-items"/g);if(count === null) {//没有找到page.evaluate(function() {//滚动到页面底部window.document.body.scrollTop = document.body.scrollHeight;});}else {//找到//做你想做的...幻影.退出();}}, 500);//滚动之间等待的毫秒数});
I am trying to scrape links from a page that generates content dynamically as the user scroll down to the bottom (infinite scrolling). I have tried doing different things with Phantomjs but not able to gather links beyond first page. Let say the element at the bottom which loads content has class .has-more-items
. It is available until final content is loaded while scrolling and then becomes unavailable in DOM (display:none). Here are the things I have tried-
- Setting viewportSize to a large height right after
var page = require('webpage').create();
page.viewportSize = { width: 1600, height: 10000, };
- Using
page.scrollPosition = { top: 10000, left: 0 }
insidepage.open
but have no effect like-
page.open('http://example.com/?q=houston', function(status) { if (status == "success") { page.scrollPosition = { top: 10000, left: 0 }; } });
- Also tried putting it inside
page.evaluate
function but that gives
Reference error: Can't find variable page
- Tried using jQuery and JS code inside
page.evaluate
andpage.open
but to no avail-
$("html, body").animate({ scrollTop: $(document).height() }, 10, function() { //console.log('check for execution'); });
as it is and also inside document.ready
. Similarly for JS code-
window.scrollBy(0,10000)
as it is and also inside window.onload
I am really struck on it for 2 days now and not able to find a way. Any help or hint would be appreciated.
Update
I have found a helpful piece of code at https://groups.google.com/forum/?fromgroups=#!topic/phantomjs/8LrWRW8ZrA0
var hitRockBottom = false; while (!hitRockBottom) {
// Scroll the page (not sure if this is the best way to do so...)
page.scrollPosition = { top: page.scrollPosition + 1000, left: 0 };
// Check if we've hit the bottom
hitRockBottom = page.evaluate(function() {
return document.querySelector(".has-more-items") === null;
}); }
Where .has-more-items
is the element class I want to access which is available at the bottom of the page initially and as we scroll down, it moves further down until all data is loaded and then becomes unavailable.
However, when I tested it is clear that it is running into infinite loops without scrolling down (I render pictures to check). I have tried to replace page.scrollPosition = { top: page.scrollPosition + 1000, left: 0 };
with codes from below as well (one at a time)
window.document.body.scrollTop = '1000';
location.href = ".has-more-items";
page.scrollPosition = { top: page.scrollPosition + 1000, left: 0 };
document.location.href=".has-more-items";
But nothing seems to work.
Found a way to do it and tried to adapt to your situation. I didn't test the best way of finding the bottom of the page because I had a different context, but check the solution below. The thing here is that you have to wait a little for the page to load and javascript works asynchronously so you have to use setInterval
or setTimeout
(see) to achieve this.
page.open('http://example.com/?q=houston', function () {
// Check for the bottom div and scroll down from time to time
window.setInterval(function() {
// Check if there is a div with class=".has-more-items"
// (not sure if there's a better way of doing this)
var count = page.content.match(/class=".has-more-items"/g);
if(count === null) { // Didn't find
page.evaluate(function() {
// Scroll to the bottom of page
window.document.body.scrollTop = document.body.scrollHeight;
});
}
else { // Found
// Do what you want
...
phantom.exit();
}
}, 500); // Number of milliseconds to wait between scrolls
});
这篇关于如何使用 Phantomjs 向下滚动以加载动态内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!