json 文件在使用 python 将其放入 zip 存档时已损坏 [英] json file get's damaged while putting it into a zip archive with python

查看:65
本文介绍了json 文件在使用 python 将其放入 zip 存档时已损坏的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在使用scrapy 抓取网站后,我在关闭方法中创建了一个zip 存档,将图片拉入其中.然后我将一个有效的 json 文件添加到存档中.

After crawling a site with scrapy, I am creating a zip archive within the closing method, pulling pictures into it. Then I add a valid json file to the archive.

解压缩后(在 mac os x 或 ubuntu 上),json 文件将显示已损坏.最后一项不见了.

After unzipping (on mac os x or ubuntu) the json file will show up damaged. The last item is missing.

解压文件结束:

..a46.jpg"]},

原始文件:

a46.jpg"]}]

代码:

# create zip archive with all images inside
filename = '../zip/' + datetime.datetime.now().strftime ("%Y%m%d-%H%M") + '_' + name
imagefolder = 'full'
imagepath = '/Users/user/test_crawl/bid/images'
shutil.make_archive(
    filename, 
    'zip', 
    imagepath,
    imagefolder
) 

# add json file to zip archive
filename_zip = filename + '.zip'
zip = zipfile.ZipFile(filename_zip,'a') 
path_to_file = '/Users/user/test_crawl/bid/data/'+  
datetime.datetime.now().strftime ("%Y%m%d") + '_' + name + '.json'
zip.write(path_to_file, os.path.basename(path_to_file)) 
zip.close()

我可以多次重现此错误,其他一切看起来都不错.

I could reproduce this error several times and everything else looks OK.

推荐答案

解决方案是使用scrapy jsonitemexporter 而不是fead exporter,因为feed exporter 会在close_spider() 期间写入文件,这已经晚了.

The solution is to use scrapy jsonitemexporter instead of fead exporter as the feed exporter will write to the file during close_spider(), which is to late.

这很容易做到.

在文件 pipelines.py 中加载 JsonItemExporter

load JsonItemExporter inside file pipelines.py

from scrapy.exporters import JsonItemExporter

像这样改变你的管道:

class MyPipeline(object):

    file = None

    def open_spider(self, spider):
        self.file = open('data/test.json', 'wb')
        self.exporter = JsonItemExporter(self.file)
        self.exporter.start_exporting()

    def close_spider(self, spider):
        self.exporter.finish_exporting()
        self.file.close()
        cleanup('zip_method')

    def process_item(self, item, spider):
        self.exporter.export_item(item)
        return item

zip_method 包含问题中提到的邮政编码.

The zip_method contains the zip code mentioned in the question.

这篇关于json 文件在使用 python 将其放入 zip 存档时已损坏的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆