如何使用Phantomjs向下滚动以加载动态内容 [英] How to scroll down with Phantomjs to load dynamic content

查看:460
本文介绍了如何使用Phantomjs向下滚动以加载动态内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从用户向下滚动到底部(无限滚动)时动态生成内容的页面上的链接。我已经尝试用Phantomjs做不同的事情,但不能收集第一页以外的链接。假设加载内容的底部的元素具有类 .has-more-items 。它可用,直到最终内容加载滚动,然后变得不可用在DOM(显示:无)。以下是我尝试过的东西 -




  • 将视口大小设置为 var page = require('网页')create();




page.viewportSize = {宽度:1600,高度:10000,
};





  • 使用 page.scrollPosition = {top:10000,left:0} 里面 page.open 但没有效果像 -




  page.open('http://example.com/?q=houston ',function(status){
if(status ==success){
page.scrollPosition = {top:10000,left:0};
}
}) ;





  • 还尝试将其放在 page.evaluate 函数,但是这个




参考错误:找不到可变页面





  • 尝试使用jQuery和JS代码 page.evaluate page.open 但无效 -




$(html,body)。animate({scrollTop:$(document).height()},10,
function(){
//console.log('执行检查');
});


的document.ready 。类似的JS代码 -

  window.scrollBy(0,10000)
pre>

,因为它也是 window.onload



我真的很惊讶,现在已经有2天了,无法找到办法。任何帮助或提示将不胜感激。



更新



https://groups.google.com/forum /?fromgroups =#!topic / phantomjs / 8LrWRW8ZrA0

  var hitRockBottom = false; while(!hitRockBottom){
//滚动页面(不知道这是否是最好的方式)
page.scrollPosition = {top:page.scrollPosition + 1000,left: 0};

//检查我们是否底部
hitRockBottom = page.evaluate(function(){
return document.querySelector(。have-more-items) === null;
});

其中 .has-more-items 是我想要访问的元素类,最初在页面底部可用,当我们向下滚动时,它进一步向下移动,直到所有数据被加载,然后变为不可用。



但是,当我测试时,很明显它正在进行无限循环,而不会向下滚动(我将渲染图片进行检查)。我试图用下面的代码(一次一个)替换 page.scrollPosition = {top:page.scrollPosition + 1000,left:0}; p>

  window.document.body.scrollTop ='1000'; 
location.href =.has-more-items;
page.scrollPosition = {top:page.scrollPosition + 1000,left:0};
document.location.href =。have-more-items;

但似乎没有任何效果。

setInterval 或 setTimeout )。

  page.open('http://example.com/?q=houston',function(){

//检查底部div并随时滚动
window.setInterval(function(){
//检查是否有一个div与class =。has-more -items
//(不知道这是否是最好的方法)
var count = page.content.match(/ class =。has-more-items/ g)

if(count === null){//没有找到
page.evaluate(function(){
//滚动到页面底部
window.document.body.scrollTop = document.body.scrollHeight;
});
}
else {//找到
//做你想要的
...
phantom.exit();
}
},500); //滚动之间等待的毫秒数

});


I am trying to scrape links from a page that generates content dynamically as the user scroll down to the bottom (infinite scrolling). I have tried doing different things with Phantomjs but not able to gather links beyond first page. Let say the element at the bottom which loads content has class .has-more-items. It is available until final content is loaded while scrolling and then becomes unavailable in DOM (display:none). Here are the things I have tried-

  • Setting viewportSize to a large height right after var page = require('webpage').create();

page.viewportSize = { width: 1600, height: 10000, };

  • Using page.scrollPosition = { top: 10000, left: 0 } inside page.open but have no effect like-

page.open('http://example.com/?q=houston', function(status) {
   if (status == "success") {
      page.scrollPosition = { top: 10000, left: 0 };  
   }
});

  • Also tried putting it inside page.evaluate function but that gives

Reference error: Can't find variable page

  • Tried using jQuery and JS code inside page.evaluate and page.open but to no avail-

$("html, body").animate({ scrollTop: $(document).height() }, 10, function() { //console.log('check for execution'); });

as it is and also inside document.ready. Similarly for JS code-

window.scrollBy(0,10000)

as it is and also inside window.onload

I am really struck on it for 2 days now and not able to find a way. Any help or hint would be appreciated.

Update

I have found a helpful piece of code at https://groups.google.com/forum/?fromgroups=#!topic/phantomjs/8LrWRW8ZrA0

var hitRockBottom = false; while (!hitRockBottom) {
    // Scroll the page (not sure if this is the best way to do so...)
    page.scrollPosition = { top: page.scrollPosition + 1000, left: 0 };

    // Check if we've hit the bottom
    hitRockBottom = page.evaluate(function() {
        return document.querySelector(".has-more-items") === null;
    }); }

Where .has-more-items is the element class I want to access which is available at the bottom of the page initially and as we scroll down, it moves further down until all data is loaded and then becomes unavailable.

However, when I tested it is clear that it is running into infinite loops without scrolling down (I render pictures to check). I have tried to replace page.scrollPosition = { top: page.scrollPosition + 1000, left: 0 }; with codes from below as well (one at a time)

window.document.body.scrollTop = '1000';
location.href = ".has-more-items";
page.scrollPosition = { top: page.scrollPosition + 1000, left: 0 };
document.location.href=".has-more-items";

But nothing seems to work.

解决方案

Found a way to do it and tried to adapt to your situation. I didn't test the best way of finding the bottom of the page because I had a different context, but check it out. The problem is that you have to wait a little for the page to load out and javascript works asynchronously so you have to use setInterval or setTimeout (see).

page.open('http://example.com/?q=houston', function () {

  // Checks for bottom div and scrolls down from time to time
  window.setInterval(function() {
      // Checks if there is a div with class=".has-more-items" 
      // (not sure if this is the best way of doing it)
      var count = page.content.match(/class=".has-more-items"/g);

      if(count === null) { // Didn't find
        page.evaluate(function() {
          // Scrolls to the bottom of page
          window.document.body.scrollTop = document.body.scrollHeight;
        });
      }
      else { // Found
        // Do what you want
        ...
        phantom.exit();
      }
  }, 500); // Number of milliseconds to wait between scrolls

});

这篇关于如何使用Phantomjs向下滚动以加载动态内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆