从 Splash 请求中读取 cookie [英] Read cookies from Splash request

查看:52
本文介绍了从 Splash 请求中读取 cookie的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在使用 Splash 发出请求后尝试访问 cookie.以下是我构建请求的方式.

I'm trying to access cookies after I've made a request using Splash. Below is how I've build the request.

script = """
function main(splash)
  splash:init_cookies(splash.args.cookies)
  assert(splash:go{
    splash.args.url,
    headers=splash.args.headers,
    http_method=splash.args.http_method,
    body=splash.args.body,
    })
  assert(splash:wait(0.5))

  local entries = splash:history()
  local last_response = entries[#entries].response
  return {
    url = splash:url(),
    headers = last_response.headers,
    http_status = last_response.status,
    cookies = splash:get_cookies(),
    html = splash:html(),
  }
end
"""
req = SplashRequest(
    url,
    self.parse_page,
    args={
        'wait': 0.5,
        'lua_source': script,
        'endpoint': 'execute'
    }
)

该脚本完全来自 Splash 文档.

The script is an exact copy from Splash documentation.

所以我正在尝试访问网页上设置的 cookie.当我不使用 Splash 时,下面的代码会按我的预期工作,但在使用 Splash 时则不会.

So I'm trying to access the cookies that are set on the webpage. When I'm not using Splash the code below works as I expect it to, but not when using Splash.

self.logger.debug('Cookies: %s', response.headers.get('Set-Cookie'))

这在使用 Splash 时返回:

This returns while using Splash:

2017-01-03 12:12:37 [spider] 调试:Cookies:无

2017-01-03 12:12:37 [spider] DEBUG: Cookies: None

当我不使用 Splash 时,此代码有效并返回网页提供的 cookie.

When I'm not using Splash this code works and returns the cookies provided by the webpage.

Splash 的文档以这段代码为例:

The documentation of Splash shows this code as example:

def parse_result(self, response):
    # here response.body contains result HTML;
    # response.headers are filled with headers from last
    # web page loaded to Splash;
    # cookies from all responses and from JavaScript are collected
    # and put into Set-Cookie response header, so that Scrapy
    # can remember them.

我不确定我是否正确理解了这一点,但我想说我应该能够像不使用 Splash 时一样访问 cookie.

I'm not sure whether I'm understanding this correctly, but I'd say I should be able to access the cookies in the same way as when I'm not using Splash.

中间件设置:

# Download middlewares 
DOWNLOADER_MIDDLEWARES = {
    # Use a random user agent on each request
    'crawling.middlewares.RandomUserAgentDownloaderMiddleware': 400,
    'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None,
    'scrapy.downloadermiddlewares.cookies.CookiesMiddleware': 700,
    # Enable crawlera proxy
    'scrapy_crawlera.CrawleraMiddleware': 600,
    # Enable Splash to render javascript
    'scrapy_splash.SplashCookiesMiddleware': 723,
    'scrapy_splash.SplashMiddleware': 725,
    'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810, 
}

所以我的问题是:如何在使用 Splash 请求时访问 cookie?

So my question is: how do I access cookies while using a Splash request?

Settings.py

spider.py

推荐答案

您可以设置 SPLASH_COOKIES_DEBUG=True 选项以查看正在设置的所有 cookie.当前 cookiejar 合并了所有 cookie,当 scrapy-splash 配置正确时,可作为 response.cookiejar 使用.

You can set SPLASH_COOKIES_DEBUG=True option to see all cookies which are being set. Current cookiejar, with all cookies merged, is available as response.cookiejar when scrapy-splash is configured correctly.

使用 response.headers.get('Set-Header') 并不可靠,因为在重定向(例如 JS 重定向)的情况下可能会有多个响应,并且可以在首先,虽然脚本仅返回最后一个响应的标头.

Using response.headers.get('Set-Header') is not robust because in case of redirects (e.g. JS redirects) there could be several responses, and a cookie could be set in the first, while script returns headers only for the last response.

我不确定这是否是您遇到的问题;该代码不是来自 Splash 文档的精确副本.这里:

I'm not sure if this is a problem you're having though; the code is not an exact copy from Splash docs. Here:

req = SplashRequest(
    url,
    self.parse_page,
    args={
        'wait': 0.5,
        'lua_source': script
    }
) 

您正在向 /render.json 端点发送请求;它不执行 Lua 脚本;使用 endpoint='execute' 来解决这个问题.

you're sending request to the /render.json endpoint; it doesn't execute Lua scripts; use endpoint='execute' to fix that.

这篇关于从 Splash 请求中读取 cookie的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆