pandas read_csv,最后一列包含逗号 [英] pandas read_csv with final column containing commas

查看:1555
本文介绍了pandas read_csv,最后一列包含逗号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我有一个csv数据集,我的书是很好的形式,我试图获得 pandas 包加载它正确。标题由5个列名称组成,但最后一列由包含非转义逗号的JSON对象组成。例如

So I have a csv dataset that by my book is well formed, and I'm trying to get the pandas package to load it correctly. The header consists of 5 column names , but the final column consists of JSON objects which contain unescaped commas. e.g.

A,B,C,D,E
1,2,3,4,{K1:V1,K2:V2}

我正在加载一个简单的 = pd.read_csv('data / training.dat')

I'm loading my data with a simple training = pd.read_csv('data/training.dat')

然而,pandas显然是将额外的逗号误解为新的未标记列,我得到这样的错误:

however, pandas is clearly misinterpreting the additional commas as new unlabeled columns, and I'm getting an error like this:

CParserError: Error tokenizing data. C error: Expected 75 fields in line 3, saw 84



我试图浏览文档,但是明显失败,有没有人知道如何正确配置 pd.read_csv 命令来正确解析它?

猜测替代方法是我可以一起编写一个脚本,使用它们的键作为列来联合JSON对象。

I guess the alternative is I could hack together a script that flattens the JSON objects using a union of their keys as columns.

推荐答案

它可以用 {和替换} },它可以正确读取: pd.read_csv('data / training.dat',quotechar =' skipinitialspace = True)

If it feasible for you to replace { with "{, and } with }", it can be read correctly by: pd.read_csv('data/training.dat',quotechar='"',skipinitialspace=True)

解决方案:

In [205]:
print pd.read_csv('a.data',sep=",(?![^{]*\})", header=None)
   0  1  2  3              4
0  A  B  C  D              E
1  1  2  3  4  {K1:V1,K2:V2}

[2 rows x 5 columns]

这篇关于pandas read_csv,最后一列包含逗号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆