使用 numpy.genfromtxt 读取包含逗号的字符串的 csv 文件 [英] Using numpy.genfromtxt to read a csv file with strings containing commas
问题描述
我正在尝试使用 numpy.genfromtxt
读取 csv 文件,但其中一些字段是包含逗号的字符串.字符串在引号中,但 numpy 没有将引号识别为定义单个字符串.例如,'t.csv' 中的数据:
I am trying to read in a csv file with numpy.genfromtxt
but some of the fields are strings which contain commas. The strings are in quotes, but numpy is not recognizing the quotes as defining a single string. For example, with the data in 't.csv':
2012, "Louisville KY", 3.5
2011, "Lexington, KY", 4.0
代码
np.genfromtxt('t.csv', delimiter=',')
产生错误:
ValueError:检测到一些错误!第 2 行(有 4 列而不是 3 列)
ValueError: Some errors were detected ! Line #2 (got 4 columns instead of 3)
我要找的数据结构是:
array([['2012', 'Louisville KY', '3.5'],
['2011', 'Lexington, KY', '4.0']],
dtype='|S13')
查看文档,我没有看到任何处理此问题的选项.有没有办法用 numpy 来解决它,还是我只需要用 csv
模块读入数据,然后将其转换为 numpy 数组?
Looking over the documentation, I don't see any options to deal with this. Is there a way do to it with numpy, or do I just need to read in the data with the csv
module and then convert it to a numpy array?
推荐答案
你可以使用 pandas(成为在科学 Python 中处理数据帧(异构数据)的默认库)为此.它是 read_csv
可以处理这个.来自文档:
You can use pandas (the becoming default library for working with dataframes (heterogeneous data) in scientific python) for this. It's read_csv
can handle this. From the docs:
quotechar : 字符串
quotechar : string
The character to used to denote the start and end of a quoted item. Quoted items
can include the delimiter and it will be ignored.
默认值为"
.示例:
In [1]: import pandas as pd
In [2]: from StringIO import StringIO
In [3]: s="""year, city, value
...: 2012, "Louisville KY", 3.5
...: 2011, "Lexington, KY", 4.0"""
In [4]: pd.read_csv(StringIO(s), quotechar='"', skipinitialspace=True)
Out[4]:
year city value
0 2012 Louisville KY 3.5
1 2011 Lexington, KY 4.0
这里的技巧是您还必须使用 skipinitialspace=True
来处理逗号分隔符之后的空格.
The trick here is that you also have to use skipinitialspace=True
to deal with the spaces after the comma-delimiter.
除了强大的 csv 阅读器之外,我还强烈建议将 Pandas 与您拥有的异构数据一起使用(您在 numpy 中给出的示例输出都是字符串,尽管您可以使用结构化数组).
Apart from a powerful csv reader, I can also strongly advice to use pandas with the heterogeneous data you have (the example output in numpy you give are all strings, although you could use structured arrays).
这篇关于使用 numpy.genfromtxt 读取包含逗号的字符串的 csv 文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!