numpy genfromtxt / pandas read_csv;忽略引号内的逗号 [英] numpy genfromtxt/pandas read_csv; ignore commas within quote marks

查看：656 发布时间：2017/11/4 22:16:35 python file-io numpy pandas genfromtxt

本文介绍了numpy genfromtxt / pandas read_csv;忽略引号内的逗号的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

考虑一个文件 a.dat ，内容为：

 地址1，地址2，地址3，num1，num2，num3 
地址1，地址2，地址3,1.0,2.0,3 
地址1，地址2，地址3，地址4， 1.0,2.0,3

我试图用 numpy.genfromtxt 。然而，该函数在第3行看到一个额外的列。我收到类似的错误与 pandas.read_csv ：

  np.genfromtxt（'a.dat'，delimiter ='，'，dtype = None，skiprows = 1）
 
 ValueError：检测到一些错误！ 
第3行（有7列而不是6）

和

  pandas read_csv类型的作品 - 但它给了我一个未对齐的数据结构：
 
 pd.read_csv（'a.dat '）
 
 pandas.parser.CParserError：标记数据出错。 C错误：预计在第3行的6个字段，看到7

我试图找到一个输入参数以弥补这一点。我不介意如果我结束了一个numpy的ndarray或熊猫数据框。

是否有一个参数，我可以在 genfromtxt 和/或 read_csv 这会让我忽略语音标记内的逗号吗？

我注意到 read_csv 包含一个 quotechar =''参数，字符串（长度1）用于表示开始
和引用结束的字符引用的项目可以包括分隔符和
它将被忽略。

这对我来说像read_csv应该为我的情况下默认 - 但它不。

我可以看到，我可以预处理该文件去除逗号 - 我想避免如果可能的话，但如果这是唯一的方法，将欢迎您的建议。 skipinitialspace = True / code> - 这个处理逗号分隔符之后的空格
$ b $ $ $ $ $ $ $ $ $ a $ pd.read_csv（' a.dat'，quotechar =''，skipinitialspace = True）

地址1地址2地址3 num1 num2 num3
0地址1地址2地址3 1 2 3
1地址1地址2地址3地址4 1 2 3

这个工作： - ）

Consider a file, a.dat, with contents:

address 1, address 2, address 3, num1, num2, num3
address 1, address 2, address 3, 1.0, 2.0, 3
address 1, address 2, "address 3, address4", 1.0, 2.0, 3

I am trying to import with numpy.genfromtxt. However the function sees an additional column in row 3. I get a similar error with pandas.read_csv:

np.genfromtxt('a.dat',delimiter=',',dtype=None,skiprows=1)

ValueError: Some errors were detected !
    Line #3 (got 7 columns instead of 6)

and

pandas read_csv sort of works - but it gives me an unaligned data structure:

pd.read_csv('a.dat')

pandas.parser.CParserError: Error tokenizing data. C error: Expected 6 fields in line 3, saw 7

I'm trying to find an input parameter to compensate for this. I don't mind if I end up with a numpy ndarray or pandas dataframe.

Is there a parameter that I can set within genfromtxt and/or read_csv that will let me ignore the comma within the speech marks?

I note that read_csv includes a quotechar='"' parameter, defined thus:

quotechar : string (length 1) The character used to denote the start and end of a quoted item. Quoted items can include the delimiter and it will be ignored.

This reads to me like read_csv should work for my case by default - yet it doesn't.

I can see that I could pre-process the file to strip out the commas - I'd like to avoid that if possible but would welcome suggestions if this is the only way.

解决方案

Just managed to find this:

The key parameter that I was missing is skipinitialspace=True - this "deals with the spaces after the comma-delimiter"

a=pd.read_csv('a.dat',quotechar='"',skipinitialspace=True)

   address 1  address 2            address 3  num1  num2  num3
0  address 1  address 2            address 3     1     2     3
1  address 1  address 2  address 3, address4     1     2     3

This works :-)

这篇关于numpy genfromtxt / pandas read_csv;忽略引号内的逗号的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

numpy genfromtxt / pandas read_csv;忽略引号内的逗号 [英] numpy genfromtxt/pandas read_csv; ignore commas within quote marks

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

numpy genfromtxt / pandas read_csv;忽略引号内的逗号 [英] numpy genfromtxt/pandas read_csv; ignore commas within quote marks

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭