pandas read_csv，最后一列包含逗号 [英] pandas read_csv with final column containing commas

查看：1555 发布时间：2017/2/24 18:45:17 python json csv pandas

本文介绍了pandas read_csv，最后一列包含逗号的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

所以我有一个csv数据集，我的书是很好的形式，我试图获得 pandas 包加载它正确。标题由5个列名称组成，但最后一列由包含非转义逗号的JSON对象组成。例如

So I have a csv dataset that by my book is well formed, and I'm trying to get the pandas package to load it correctly. The header consists of 5 column names , but the final column consists of JSON objects which contain unescaped commas. e.g.

A,B,C,D,E
1,2,3,4,{K1:V1,K2:V2}

我正在加载一个简单的 = pd.read_csv（'data / training.dat'）

I'm loading my data with a simple training = pd.read_csv('data/training.dat')

然而，pandas显然是将额外的逗号误解为新的未标记列，我得到这样的错误：

however, pandas is clearly misinterpreting the additional commas as new unlabeled columns, and I'm getting an error like this:

CParserError: Error tokenizing data. C error: Expected 75 fields in line 3, saw 84

我试图浏览文档，但是明显失败，有没有人知道如何正确配置 pd.read_csv 命令来正确解析它？

猜测替代方法是我可以一起编写一个脚本，使用它们的键作为列来联合JSON对象。

I guess the alternative is I could hack together a script that flattens the JSON objects using a union of their keys as columns.

推荐答案

它可以用 {和替换和} }，它可以正确读取： pd.read_csv（'data / training.dat'，quotechar =' skipinitialspace = True）


If it feasible for you to replace { with "{, and } with }", it can be read correctly by: pd.read_csv('data/training.dat',quotechar='"',skipinitialspace=True)
解决方案：
In [205]:
print pd.read_csv('a.data',sep=",(?![^{]*\})", header=None)
   0  1  2  3              4
0  A  B  C  D              E
1  1  2  3  4  {K1:V1,K2:V2}

[2 rows x 5 columns]


                        这篇关于pandas read_csv，最后一列包含逗号的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

pandas read_csv，最后一列包含逗号 [英] pandas read_csv with final column containing commas

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

pandas read_csv，最后一列包含逗号 [英] pandas read_csv with final column containing commas

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭