Casperjs刮动动态内容 [英] Casperjs scraping dynamic content

查看:101
本文介绍了Casperjs刮动动态内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正试图抓住这个页面使用Casperjs。我的代码的主要功能很好,但内容是动态加载的,我无法弄清楚如何触发它。

I'm trying to scrape this page using Casperjs. The main function to my code works just fine, but the content is loaded dynamically and I can't figure out how to trigger that.

这就是我正在做的事情现在:

This is what I'm doing right now:

casper.waitFor(function() {

    this.scrollToBottom();

    var count = this.evaluate(function() {
        var match = document.querySelectorAll('.loading-msg');
        return match.length;
    });

    if (count <= 1) {
        return true;
    }
    else {
        return false
    };

}, function() { // do stuff });

等待超时即将到期,即使我已将其增加到20秒,并且新内容从未得到了加载。我已经尝试过这个函数适应我的情况:

The wait timeout just expires, even though I've increased it to 20s, and the new content never gets loaded. I've tried adapting this function to my case:

function tryAndScroll(casper) {
  casper.waitFor(function() {
    this.page.scrollPosition = { top: this.page.scrollPosition["top"] + 4000, left: 0 };
    return true;
  }, function() {
    var info = this.getElementInfo('p[loading-spinner="!loading"]');
    if (info["visible"] == true) {
      this.waitWhileVisible('p[loading-spinner="!loading"]', function () {
        this.emit('results.loaded');
      }, function () {
        this.echo('next results not loaded');
      }, 5000);
    }
  }, function() {
    this.echo("Scrolling failed. Sorry.").exit();
  }, 500);
}

但我无法理解,我甚至不确定它是不是相关的。
有什么想法吗?

But I couldn't figure it out and I'm not even sure it's relevant here. Any ideas?

推荐答案

我看过这个页面。它有这样的行为,当你跳到最后时它不会加载中间图像。

I've looked to the page. It has such a behvior that it doesn't load the middle images when you jump to the end.

当页面加载时,前几行被完全加载并且还有一些没有完全加载(图片缺失由'。loading-msg'元素表示)。当您使用 this.scrollToBottom(); 跳转到最后时,没有连续滚动。它跳到最后,页面JavaScript没有检测到中间图像在视口中,但是很简单。页面继续加载下一行,但不会删除跳过的行的缺失图像。

When the page is loaded the first couple of rows are completely loaded and some more are not completely loaded (image missing denoted by '.loading-msg' element). When you jump to the end with this.scrollToBottom(); there is no continous scroll. It jumps to the end and the page JavaScript doesn't detect that the middle images were in the viewport, however briefly. The page goes on to load the next rows, but not the missing images of the jumped over rows.

你必须减少两个片段中跳转的距离。

You have to reduce the distance of the jump in both of your snippets.

第一个可以这样改变:

var pos = 0, 
    height = casper.page.viewportSize.height;
casper.waitFor(function() {
    this.scrollTo(0, pos * height);
    return !this.exists('.loading-msg');
}, function() { // do stuff }, 20000);

第二个可能会改变

this.page.scrollPosition = { top: this.page.scrollPosition["top"] + 4000, left: 0 };

var height = casper.page.viewportSize.height;
this.page.scrollPosition = { top: this.page.scrollPosition.top + height, left: 0 };

这篇关于Casperjs刮动动态内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆