从pandas读取csv,同时对列值使用quotechar和delimiter [英] Reading csv from pandas having both quotechar and delimiter for a column value
问题描述
这里是csv文件'test.csv'的内容,我想通过pandas read_csv()读取它
Here is the content of a csv file 'test.csv', i am trying to read it via pandas read_csv()
"col1", "col2", "col3", "col4"
"v1", "v2", "v3", "v4"
"v21", "v22", "v23", "this, "creating, what to do? " problems"
这是我使用的命令 -
This is the command i am using -
messages = pd.read_csv('test.csv', sep=',', skipinitialspace=True)
但我得到以下错误 -
But i am getting the following error -
CParserError: Error tokenizing data. C error: Expected 4 fields in line 3, saw 5
我想要第3行的column4的内容
i want the content for column4 in line3 to be 'this, "creating, what to do? " problems'
如果在列中可以包含限制符和分隔符,如何读取文件?
How to read file when a column can have quotechar and delimiter included in it ?
推荐答案
pandas不允许保持格式不正确的行,说实话我真的没有看到一种方法忽略一些
作为分隔符,然后做一个清理的是你的直觉使用','
如果你真的担心这样做在一行:
pandas does not allow you to keep malformed rows and to be honest I don't really see a way of ignoring some "
characters but not others in your example. I think your intuition of using '", "'
as the delimiter and then doing a cleanup is the best approach. If you're really worried about doing this in one line:
message = pd.read_csv('test.txt', sep='", "', names = ['col1','col2','col3','col4'], skiprows=1).apply(lambda x: x.str.strip('"'))
它还处理列名称中的删除引号,并给出:
which handles stripping quotes in the column names as well and gives you:
>>> message
>>>
col1 col2 col3 col4
0 v1 v2 v3 v4
1 v21 v22 v23 this, "creating, what to do? " problems
这篇关于从pandas读取csv,同时对列值使用quotechar和delimiter的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!