如何使用 Phantomjs 向下滚动以加载动态内容 [英] How to scroll down with Phantomjs to load dynamic content

查看:23
本文介绍了如何使用 Phantomjs 向下滚动以加载动态内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图从一个页面中抓取链接,当用户向下滚动到底部(无限滚动)时,该页面会动态生成内容.我尝试用 Phantomjs 做不同的事情,但无法收集第一页以外的链接.假设加载内容的底部元素具有类 .has-more-items.它在滚动时加载最终内容之前可用,然后在 DOM 中变得不可用(显示:无).这是我尝试过的东西-

  • var page = require('webpage').create();
  • 之后立即将 viewportSize 设置为较大的高度
<块引用>

page.viewportSize = { 宽度:1600,高度:10000,};

  • page.open 中使用 page.scrollPosition = { top: 10000, left: 0 } 但没有效果像-
<块引用>

page.open('http://example.com/?q=houston', function(status) {如果(状态==成功"){page.scrollPosition = { 顶部:10000,左侧:0 };}});

  • 还尝试将它放在 page.evaluate 函数中,但这给出了
<块引用>

引用错误:找不到变量页面

  • 尝试在 page.evaluatepage.open 中使用 jQuery 和 JS 代码,但无济于事-
<块引用>

$("html, body").animate({ scrollTop: $(document).height() }, 10,功能() {//console.log('检查执行');});

原样,也在 document.ready 中.同样对于JS代码-

window.scrollBy(0,10000)

原样,也在 window.onload

我真的被它震惊了 2 天,但找不到方法.任何帮助或提示将不胜感激.

更新

我在 https://groups.google.com/forum/?fromgroups=#!topic/phantomjs/8LrWRW8ZrA0

var hitRockBottom = false;而(!hitRockBottom){//滚动页面(不确定这是否是最好的方法...)page.scrollPosition = { top: page.scrollPosition + 1000, left: 0 };//检查我们是否已经触底hitRockBottom = page.evaluate(function() {return document.querySelector(".has-more-items") === null;});}

其中 .has-more-items 是我想要访问的元素类,最初在页面底部可用,当我们向下滚动时,它会进一步向下移动,直到加载所有数据然后变得不可用.

但是,当我测试时很明显它在不向下滚动的情况下进入无限循环(我渲染图片以进行检查).我试图用下面的代码(一次一个)替换 page.scrollPosition = { top: page.scrollPosition + 1000, left: 0 };

window.document.body.scrollTop = '1000';location.href = ".has-more-items";page.scrollPosition = { top: page.scrollPosition + 1000, left: 0 };document.location.href=".has-more-items";

但似乎没有任何效果.

解决方案

找到了解决方案并尝试适应您的情况.我没有测试找到页面底部的最佳方法,因为我有不同的上下文,但请检查下面的解决方案.这里的问题是你必须等待页面加载和 javascript 异步工作,所以你必须使用 setIntervalsetTimeout (see) 来实现这一点.

page.open('http://example.com/?q=houston', function () {//检查底部 div 并不时向下滚动window.setInterval(function() {//检查是否有 class=".has-more-items" 的 div//(不确定是否有更好的方法)var count = page.content.match(/class=.has-more-items"/g);if(count === null) {//没有找到page.evaluate(function() {//滚动到页面底部window.document.body.scrollTop = document.body.scrollHeight;});}else {//找到//做你想做的...幻影.退出();}}, 500);//滚动之间等待的毫秒数});

I am trying to scrape links from a page that generates content dynamically as the user scroll down to the bottom (infinite scrolling). I have tried doing different things with Phantomjs but not able to gather links beyond first page. Let say the element at the bottom which loads content has class .has-more-items. It is available until final content is loaded while scrolling and then becomes unavailable in DOM (display:none). Here are the things I have tried-

  • Setting viewportSize to a large height right after var page = require('webpage').create();

page.viewportSize = { width: 1600, height: 10000, };

  • Using page.scrollPosition = { top: 10000, left: 0 } inside page.open but have no effect like-

page.open('http://example.com/?q=houston', function(status) {
   if (status == "success") {
      page.scrollPosition = { top: 10000, left: 0 };  
   }
});

  • Also tried putting it inside page.evaluate function but that gives

Reference error: Can't find variable page

  • Tried using jQuery and JS code inside page.evaluate and page.open but to no avail-

$("html, body").animate({ scrollTop: $(document).height() }, 10, function() { //console.log('check for execution'); });

as it is and also inside document.ready. Similarly for JS code-

window.scrollBy(0,10000)

as it is and also inside window.onload

I am really struck on it for 2 days now and not able to find a way. Any help or hint would be appreciated.

Update

I have found a helpful piece of code at https://groups.google.com/forum/?fromgroups=#!topic/phantomjs/8LrWRW8ZrA0

var hitRockBottom = false; while (!hitRockBottom) {
    // Scroll the page (not sure if this is the best way to do so...)
    page.scrollPosition = { top: page.scrollPosition + 1000, left: 0 };

    // Check if we've hit the bottom
    hitRockBottom = page.evaluate(function() {
        return document.querySelector(".has-more-items") === null;
    }); }

Where .has-more-items is the element class I want to access which is available at the bottom of the page initially and as we scroll down, it moves further down until all data is loaded and then becomes unavailable.

However, when I tested it is clear that it is running into infinite loops without scrolling down (I render pictures to check). I have tried to replace page.scrollPosition = { top: page.scrollPosition + 1000, left: 0 }; with codes from below as well (one at a time)

window.document.body.scrollTop = '1000';
location.href = ".has-more-items";
page.scrollPosition = { top: page.scrollPosition + 1000, left: 0 };
document.location.href=".has-more-items";

But nothing seems to work.

解决方案

Found a way to do it and tried to adapt to your situation. I didn't test the best way of finding the bottom of the page because I had a different context, but check the solution below. The thing here is that you have to wait a little for the page to load and javascript works asynchronously so you have to use setInterval or setTimeout (see) to achieve this.

page.open('http://example.com/?q=houston', function () {

  // Check for the bottom div and scroll down from time to time
  window.setInterval(function() {
      // Check if there is a div with class=".has-more-items" 
      // (not sure if there's a better way of doing this)
      var count = page.content.match(/class=".has-more-items"/g);

      if(count === null) { // Didn't find
        page.evaluate(function() {
          // Scroll to the bottom of page
          window.document.body.scrollTop = document.body.scrollHeight;
        });
      }
      else { // Found
        // Do what you want
        ...
        phantom.exit();
      }
  }, 500); // Number of milliseconds to wait between scrolls

});

这篇关于如何使用 Phantomjs 向下滚动以加载动态内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆