Casperjs 抓取动态内容 [英] Casperjs scraping dynamic content

查看:31
本文介绍了Casperjs 抓取动态内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试抓取此页面 使用 Casperjs.我的代码的主要功能工作正常,但内容是动态加载的,我不知道如何触发.

I'm trying to scrape this page using Casperjs. The main function to my code works just fine, but the content is loaded dynamically and I can't figure out how to trigger that.

这就是我现在正在做的:

This is what I'm doing right now:

casper.waitFor(function() {

    this.scrollToBottom();

    var count = this.evaluate(function() {
        var match = document.querySelectorAll('.loading-msg');
        return match.length;
    });

    if (count <= 1) {
        return true;
    }
    else {
        return false
    };

}, function() { // do stuff });

等待超时刚刚到期,即使我已将其增加到 20 秒,并且新内容从未加载过.我试过根据我的情况调整这个函数:

The wait timeout just expires, even though I've increased it to 20s, and the new content never gets loaded. I've tried adapting this function to my case:

function tryAndScroll(casper) {
  casper.waitFor(function() {
    this.page.scrollPosition = { top: this.page.scrollPosition["top"] + 4000, left: 0 };
    return true;
  }, function() {
    var info = this.getElementInfo('p[loading-spinner="!loading"]');
    if (info["visible"] == true) {
      this.waitWhileVisible('p[loading-spinner="!loading"]', function () {
        this.emit('results.loaded');
      }, function () {
        this.echo('next results not loaded');
      }, 5000);
    }
  }, function() {
    this.echo("Scrolling failed. Sorry.").exit();
  }, 500);
}

但我无法弄清楚,我什至不确定它在这里是否相关.有什么想法吗?

But I couldn't figure it out and I'm not even sure it's relevant here. Any ideas?

推荐答案

我查看了页面.它有这样一个行为,当你跳到最后时,它不会加载中间的图像.

I've looked to the page. It has such a behvior that it doesn't load the middle images when you jump to the end.

加载页面时,前几行已完全加载,还有一些未完全加载(图像丢失由 '.loading-msg' 元素表示).当你用 this.scrollToBottom(); 跳到最后时,没有连续滚动.它跳到最后,页面 JavaScript 没有检测到中间图像在视口中,无论多么短暂.页面继续加载下一行,但不会加载跳过的行的缺失图像.

When the page is loaded the first couple of rows are completely loaded and some more are not completely loaded (image missing denoted by '.loading-msg' element). When you jump to the end with this.scrollToBottom(); there is no continous scroll. It jumps to the end and the page JavaScript doesn't detect that the middle images were in the viewport, however briefly. The page goes on to load the next rows, but not the missing images of the jumped over rows.

您必须缩短两个片段中的跳跃距离.

You have to reduce the distance of the jump in both of your snippets.

第一个可以像这样改变:

The first one can be changed like this:

var pos = 0, 
    height = casper.page.viewportSize.height;
casper.waitFor(function() {
    this.scrollTo(0, pos * height);
    return !this.exists('.loading-msg');
}, function() { // do stuff }, 20000);

第二个可能可以通过改变来工作

The second one might work by changing

this.page.scrollPosition = { top: this.page.scrollPosition["top"] + 4000, left: 0 };

var height = casper.page.viewportSize.height;
this.page.scrollPosition = { top: this.page.scrollPosition.top + height, left: 0 };

这篇关于Casperjs 抓取动态内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆