如何使用python脚本快速压缩文件 [英] How to snappy compress a file using a python script
问题描述
我正在尝试使用python脚本和python-snappy模块将csv文件压缩为活泼的格式.到目前为止,这是我的代码:
I am trying to compress in snappy format a csv file using a python script and the python-snappy module. This is my code so far:
import snappy
d = snappy.compress("C:\\Users\\my_user\\Desktop\\Test\\Test_file.csv")
with open("compressed_file.snappy", 'w') as snappy_data:
snappy_data.write(d)
snappy_data.close()
此代码实际上创建了一个快照文件,但是创建的快照文件仅包含一个字符串:"C:\Users\my_user\Desktop\Test\Test_file.csv"
This code actually creates a snappy file, but the snappy file created only contains a string: "C:\Users\my_user\Desktop\Test\Test_file.csv"
所以我对压缩csv有点迷失.我使用以下命令在Windows cmd上完成了工作:
So I am a bit lost on getting my csv compressed. I got it done working on windows cmd with this command:
python -m snappy -c Test_file.csv compressed_file.snappy
但是我需要将它作为python脚本的一部分来完成,因此在cmd上工作对我来说不合适.
But I need it to be done as a part of a python script, so working on cmd is not fine for me.
非常感谢你, 阿尔瓦罗
Thank you very much, Álvaro
推荐答案
您正在压缩纯字符串,因为compress函数将获取原始数据.
You are compressing the plain string, as the compress function takes raw data.
有两种压缩快速数据的方式-一种是块,另一种是流式(或成帧的)数据
There are two ways to compress snappy data - as one block and the other as streaming (or framed) data
此功能将使用框架方法压缩文件
This function will compress a file using framed method
import snappy
def snappy_compress(path):
path_to_store = path+'.snappy'
with open(path, 'rb') as in_file:
with open(path_to_store, 'w') as out_file:
snappy.stream_compress(in_file, out_file)
out_file.close()
in_file.close()
return path_to_store
snappy_compress('testfile.csv')
您可以使用以下命令从命令行解压缩:
You can decompress from command line using:
python -m snappy -d testfile.csv.snappy testfile_decompressed.csv
应该注意,当前python/snappy使用的框架与Hadoop使用的框架不兼容
It should be noted that the current framing used by python / snappy is not compatible with the framing used by Hadoop
这篇关于如何使用python脚本快速压缩文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!