Scrapy Splash 截图? [英] Scrapy Splash Screenshots?

查看：51 发布时间：2021/12/30 20:03:12 python lua scrapy splash-screen

本文介绍了Scrapy Splash 截图?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试抓取一个网站，同时对每个页面进行截图.到目前为止，我已经设法拼凑出以下代码:

I'm trying to scrape a site whilst taking a screenshot of every page. So far, I have managed to piece together the following code:

import json
import base64
import scrapy
from scrapy_splash import SplashRequest


class ExtractSpider(scrapy.Spider):
    name = 'extract'

    def start_requests(self):
        url = 'https://stackoverflow.com/'
        splash_args = {
            'html': 1,
            'png': 1
        }
        yield SplashRequest(url, self.parse_result, endpoint='render.json', args=splash_args)

    def parse_result(self, response):
        png_bytes = base64.b64decode(response.data['png'])

        imgdata = base64.b64decode(png_bytes)
        filename = 'some_image.png'
        with open(filename, 'wb') as f:
            f.write(imgdata)

它可以很好地进入网站(例如，stackoverflow)并返回 png_bytes 的数据，但是当写入文件时 - 返回损坏的图像(无法加载).

It gets onto the site fine (example, stackoverflow) and returns data for png_bytes, but when written to a file - returns a broken image (doesn't load).

有没有办法解决这个问题，或者找到更有效的解决方案?我读过 Splash Lua Scripts 可以做到这一点，但一直无法找到实现这一点的方法.谢谢.

Is there a way to fix this, or alternatively find a more efficient solution? I have read that Splash Lua Scripts can do this, but have been unable to find a way to implement this. Thanks.

推荐答案

您正在从 base64 解码两次:

You are decoding from base64 twice:

       png_bytes = base64.b64decode(response.data['png'])
       imgdata = base64.b64decode(png_bytes)

只需:

    def parse_result(self, response):
        imgdata = base64.b64decode(response.data['png'])
        filename = 'some_image.png'
        with open(filename, 'wb') as f:
            f.write(imgdata)

这篇关于Scrapy Splash 截图?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Scrapy Splash 截图? [英] Scrapy Splash Screenshots?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Scrapy Splash 截图? [英] Scrapy Splash Screenshots?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭