如何使用pandas read_csv和gzip压缩选项读取tar.gz文件? [英] How can I read tar.gz file using pandas read_csv with gzip compression option?
本文介绍了如何使用pandas read_csv和gzip压缩选项读取tar.gz文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个非常简单的csv,与以下数据,压缩在tar.gz文件内。我需要使用pandas.read_csv在dataframe中读取。
I have a very simple csv, with the following data, compressed inside the tar.gz file. I need to read that in dataframe using pandas.read_csv.
A B
0 1 4
1 2 5
2 3 6
import pandas as pd
pd.read_csv("sample.tar.gz",compression='gzip')
但是,我收到错误:
CParserError: Error tokenizing data. C error: Expected 1 fields in line 440, saw 2
以下是read_csv命令集,我遇到了不同的错误:
Following are the set of read_csv commands and the different errors I get with them:
pd.read_csv("sample.tar.gz",compression='gzip', engine='python')
Error: line contains NULL byte
pd.read_csv("sample.tar.gz",compression='gzip', header=0)
CParserError: Error tokenizing data. C error: Expected 1 fields in line 440, saw 2
pd.read_csv("sample.tar.gz",compression='gzip', header=0, sep=" ")
CParserError: Error tokenizing data. C error: Expected 2 fields in line 94, saw 14
pd.read_csv("sample.tar.gz",compression='gzip', header=0, sep=" ", engine='python')
Error: line contains NULL byte
如何解决此问题?
推荐答案
df = pd.read_csv('sample.tar.gz', compression='gzip', header=0, sep=' ', quotechar='"', error_bad_lines=False)
$ b b
注意: error_bad_lines = False
将忽略违规行。
Note: error_bad_lines=False
will ignore the offending rows.
这篇关于如何使用pandas read_csv和gzip压缩选项读取tar.gz文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文