如何使用python脚本快速压缩文件 [英] How to snappy compress a file using a python script

查看:447
本文介绍了如何使用python脚本快速压缩文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用python脚本和python-snappy模块将csv文件压缩为活泼的格式.到目前为止,这是我的代码:

I am trying to compress in snappy format a csv file using a python script and the python-snappy module. This is my code so far:

import snappy
d = snappy.compress("C:\\Users\\my_user\\Desktop\\Test\\Test_file.csv")
with open("compressed_file.snappy", 'w') as snappy_data:
     snappy_data.write(d)
snappy_data.close()

此代码实际上创建了一个快照文件,但是创建的快照文件仅包含一个字符串:"C:\Users\my_user\Desktop\Test\Test_file.csv"

This code actually creates a snappy file, but the snappy file created only contains a string: "C:\Users\my_user\Desktop\Test\Test_file.csv"

所以我对压缩csv有点迷失.我使用以下命令在Windows cmd上完成了工作:

So I am a bit lost on getting my csv compressed. I got it done working on windows cmd with this command:

python -m snappy -c Test_file.csv compressed_file.snappy

但是我需要将它作为python脚本的一部分来完成,因此在cmd上工作对我来说不合适.

But I need it to be done as a part of a python script, so working on cmd is not fine for me.

非常感谢你, 阿尔瓦罗

Thank you very much, Álvaro

推荐答案

您正在压缩纯字符串,因为compress函数将获取原始数据.

You are compressing the plain string, as the compress function takes raw data.

有两种压缩快速数据的方式-一种是块,另一种是流式(或成帧的)数据

There are two ways to compress snappy data - as one block and the other as streaming (or framed) data

此功能将使用框架方法压缩文件

This function will compress a file using framed method

import snappy

def snappy_compress(path):
        path_to_store = path+'.snappy'

        with open(path, 'rb') as in_file:
          with open(path_to_store, 'w') as out_file:
            snappy.stream_compress(in_file, out_file)
            out_file.close()
            in_file.close()

        return path_to_store

snappy_compress('testfile.csv')

您可以使用以下命令从命令行解压缩:

You can decompress from command line using:

python -m snappy -d testfile.csv.snappy testfile_decompressed.csv

应该注意,当前python/snappy使用的框架与Hadoop使用的框架不兼容

It should be noted that the current framing used by python / snappy is not compatible with the framing used by Hadoop

这篇关于如何使用python脚本快速压缩文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆