将GZIP压缩应用于Python Pandas中的CSV [英] Apply GZIP compression to a CSV in Python Pandas

查看:277
本文介绍了将GZIP压缩应用于Python Pandas中的CSV的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用以下命令将数据帧写入python大熊猫中gzip压缩的csv中:

I am trying to write a dataframe to a gzipped csv in python pandas, using the following:

import pandas as pd
import datetime
import csv
import gzip

# Get data (with previous connection and script variables)
df = pd.read_sql_query(script, conn)

# Create today's date, to append to file
todaysdatestring = str(datetime.datetime.today().strftime('%Y%m%d'))
print todaysdatestring

# Create csv with gzip compression
df.to_csv('foo-%s.csv.gz' % todaysdatestring,
      sep='|',
      header=True,
      index=False,
      quoting=csv.QUOTE_ALL,
      compression='gzip',
      quotechar='"',
      doublequote=True,
      line_terminator='\n')

这只会创建一个名为"foo-YYYYMMDD.csv.gz"的csv,而不是实际的gzip存档.

This just creates a csv called 'foo-YYYYMMDD.csv.gz', not an actual gzip archive.

我也尝试添加以下内容:

I've also tried adding this:

#Turn to_csv statement into a variable
d = df.to_csv('foo-%s.csv.gz' % todaysdatestring,
      sep='|',
      header=True,
      index=False,
      quoting=csv.QUOTE_ALL,
      compression='gzip',
      quotechar='"',
      doublequote=True,
      line_terminator='\n')

# Write above variable to gzip
 with gzip.open('foo-%s.csv.gz' % todaysdatestring, 'wb') as output:
   output.write(d)

哪个也失败了.有任何想法吗?

Which fails as well. Any ideas?

推荐答案

df.to_csv()与关键字参数compression='gzip'一起使用应产生一个gzip存档.我使用与您相同的关键字参数对其进行了测试,并且可以正常工作.

Using df.to_csv() with the keyword argument compression='gzip' should produce a gzip archive. I tested it using same keyword arguments as you, and it worked.

您可能需要升级熊猫,因为gzip直到版本0.17.1才实现,但是尝试在以前的版本中使用它不会引发错误,而只是生成常规的csv.您可以通过查看pd.__version__的输出来确定当前的熊猫版本.

You may need to upgrade pandas, as gzip was not implemented until version 0.17.1, but trying to use it on prior versions will not raise an error, and just produce a regular csv. You can determine your current version of pandas by looking at the output of pd.__version__.

这篇关于将GZIP压缩应用于Python Pandas中的CSV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆