Python在保持串行读取的同时压缩一系列JSON对象? [英] Python Compressing A Series of JSON Objects While Maintaining Serial Reading?

查看:558
本文介绍了Python在保持串行读取的同时压缩一系列JSON对象?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一堆 json对象,我需要压缩,因为它吃太多的磁盘空间,大约 20 gigs 值得几个其中百万。



理想情况下,我想要做的是单独压缩,然后当我需要读取它们,只是迭代加载和解压缩每一个。我试图这样做通过创建一个文本文件,每行都是通过 zlib 压缩的 json对象,但这是失败与



由于截断流



解压缩错误,我相信这是由于压缩字符串包含新行。



任何人都知道一个好的方法来做这个?

解决方案

只需使用 gzip.GzipFile() object ,并像普通文件一样对待它;



对象会透明地处理压缩,并根据需要缓冲读取,解压缩卡盘。

  import gzip 
import json

使用gzip.GzipFile编写
(jsonfilename, 'w')as outfile:
for obj in objects:
outfile.write(json.dumps(obj)+'\\\
')

#reading
与gzip.GzipFile(jsonfilename,'r')as文件:
for in infile:
obj = json.loads(line)
#process obj

这具有附加的优点,即压缩算法可以利用 / p>

I have a bunch of json objects that I need to compress as it's eating too much disk space, approximately 20 gigs worth for a few million of them.

Ideally what I'd like to do is compress each individually and then when I need to read them, just iteratively load and decompress each one. I tried doing this by creating a text file with each line being a compressed json object via zlib, but this is failing with a

decompress error due to a truncated stream,

which I believe is due to the compressed strings containing new lines.

Anyone know of a good method to do this?

解决方案

Just use a gzip.GzipFile() object and treat it like a regular file; write JSON objects line by line, and read them line by line.

The object takes care of compression transparently, and will buffer reads, decompressing chucks as needed.

import gzip
import json

# writing
with gzip.GzipFile(jsonfilename, 'w') as outfile:
    for obj in objects:
        outfile.write(json.dumps(obj) + '\n')

# reading
with gzip.GzipFile(jsonfilename, 'r') as isfile:
    for line in infile:
        obj = json.loads(line)
        # process obj

This has the added advantage that the compression algorithm can make use of repetition across objects for compression ratios.

这篇关于Python在保持串行读取的同时压缩一系列JSON对象?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆