使用 numpy.genfromtxt 读取包含逗号的字符串的 csv 文件 [英] Using numpy.genfromtxt to read a csv file with strings containing commas

查看:33
本文介绍了使用 numpy.genfromtxt 读取包含逗号的字符串的 csv 文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 numpy.genfromtxt 读取 csv 文件,但其中一些字段是包含逗号的字符串.字符串在引号中,但 numpy 没有将引号识别为定义单个字符串.例如,'t.csv' 中的数据:

I am trying to read in a csv file with numpy.genfromtxt but some of the fields are strings which contain commas. The strings are in quotes, but numpy is not recognizing the quotes as defining a single string. For example, with the data in 't.csv':

2012, "Louisville KY", 3.5
2011, "Lexington, KY", 4.0

代码

np.genfromtxt('t.csv', delimiter=',')

产生错误:

ValueError:检测到一些错误!第 2 行(有 4 列而不是 3 列)

ValueError: Some errors were detected ! Line #2 (got 4 columns instead of 3)

我要找的数据结构是:

array([['2012', 'Louisville KY', '3.5'],
       ['2011', 'Lexington, KY', '4.0']], 
      dtype='|S13')

查看文档,我没有看到任何处理此问题的选项.有没有办法用 numpy 来解决它,还是我只需要用 csv 模块读入数据,然后将其转换为 numpy 数组?

Looking over the documentation, I don't see any options to deal with this. Is there a way do to it with numpy, or do I just need to read in the data with the csv module and then convert it to a numpy array?

推荐答案

你可以使用 pandas(成为在科学 Python 中处理数据帧(异构数据)的默认库)为此.它是 read_csv 可以处理这个.来自文档:

You can use pandas (the becoming default library for working with dataframes (heterogeneous data) in scientific python) for this. It's read_csv can handle this. From the docs:

quotechar : 字符串

quotechar : string

The character to used to denote the start and end of a quoted item. Quoted items 
can include the delimiter and it will be ignored.

默认值为".示例:

In [1]: import pandas as pd

In [2]: from StringIO import StringIO

In [3]: s="""year, city, value
   ...: 2012, "Louisville KY", 3.5
   ...: 2011, "Lexington, KY", 4.0"""

In [4]: pd.read_csv(StringIO(s), quotechar='"', skipinitialspace=True)
Out[4]:
   year           city  value
0  2012  Louisville KY    3.5
1  2011  Lexington, KY    4.0

这里的技巧是您还必须使用 skipinitialspace=True 来处理逗号分隔符之后的空格.

The trick here is that you also have to use skipinitialspace=True to deal with the spaces after the comma-delimiter.

除了强大的 csv 阅读器之外,我还强烈建议将 Pandas 与您拥有的异构数据一起使用(您在 numpy 中给出的示例输出都是字符串,尽管您可以使用结构化数组).

Apart from a powerful csv reader, I can also strongly advice to use pandas with the heterogeneous data you have (the example output in numpy you give are all strings, although you could use structured arrays).

这篇关于使用 numpy.genfromtxt 读取包含逗号的字符串的 csv 文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆