scrapy-splash 如何处理无限滚动? [英] how does scrapy-splash handle infinite scrolling?

查看:146
本文介绍了scrapy-splash 如何处理无限滚动?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想对在网页中向下滚动生成的内容进行逆向工程.问题出在 url https://www.crowdfunder.com/user/following_page/80159?user_id=80159&limit=0&per_page=20&screwrand=933.screwrand 似乎没有遵循任何模式,因此反转 url 不起作用.我正在考虑使用 Splash 进行自动渲染.如何使用 Splash 像浏览器一样滚动?非常感谢!以下是两个请求的代码:

I want to reverse engineering the contents generated by scrolling down in the webpage. The problem is in the url https://www.crowdfunder.com/user/following_page/80159?user_id=80159&limit=0&per_page=20&screwrand=933. screwrand doesn't seem to follow any pattern, so the reversing the urls don't work. I'm considering the automatic rendering using Splash. How to use Splash to scroll like browsers? Thanks a lot! Here are the codes for two request:

request1 = scrapy_splash.SplashRequest(
    'https://www.crowdfunder.com/user/following/{}'.format(user_id),
     self.parse_follow_relationship,
     args={'wait':2},
     meta={'user_id':user_id, 'action':'following'},
     endpoint='http://192.168.99.100:8050/render.html')

yield request1

request2 = scrapy_splash.SplashRequest(
    'https://www.crowdfunder.com/user/following_user/80159?user_id=80159&limit=0&per_page=20&screwrand=76',
    self.parse_tmp,
    meta={'user_id':user_id, 'action':'following'},
    endpoint='http://192.168.99.100:8050/render.html')

yield request2

浏览器控制台中显示的ajax请求

推荐答案

要滚动页面,您可以编写自定义呈现脚本(请参阅 http://splash.readthedocs.io/en/stable/scripting-tutorial.html),类似这样:

To scroll a page you can write a custom rendering script (see http://splash.readthedocs.io/en/stable/scripting-tutorial.html), something like this:

function main(splash)
    local num_scrolls = 10
    local scroll_delay = 1.0

    local scroll_to = splash:jsfunc("window.scrollTo")
    local get_body_height = splash:jsfunc(
        "function() {return document.body.scrollHeight;}"
    )
    assert(splash:go(splash.args.url))
    splash:wait(splash.args.wait)

    for _ = 1, num_scrolls do
        scroll_to(0, get_body_height())
        splash:wait(scroll_delay)
    end        
    return splash:html()
end

要呈现此脚本,请使用执行"端点而不是 render.html 端点:

To render this script use 'execute' endpoint instead of render.html endpoint:

script = """<Lua script> """
scrapy_splash.SplashRequest(url, self.parse,
                            endpoint='execute', 
                            args={'wait':2, 'lua_source': script}, ...)

这篇关于scrapy-splash 如何处理无限滚动?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆