从pandas读取csv,同时对列值使用quotechar和delimiter [英] Reading csv from pandas having both quotechar and delimiter for a column value

查看:3944
本文介绍了从pandas读取csv,同时对列值使用quotechar和delimiter的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这里是csv文件'test.csv'的内容,我想通过pandas read_csv()读取它

Here is the content of a csv file 'test.csv', i am trying to read it via pandas read_csv()

"col1", "col2", "col3", "col4"
"v1", "v2", "v3", "v4"
"v21", "v22", "v23", "this, "creating, what to do? " problems"

这是我使用的命令 -

This is the command i am using -

messages = pd.read_csv('test.csv', sep=',', skipinitialspace=True)

但我得到以下错误 -

But i am getting the following error -

CParserError: Error tokenizing data. C error: Expected 4 fields in line 3, saw 5

我想要第3行的column4的内容

i want the content for column4 in line3 to be 'this, "creating, what to do? " problems'

如果在列中可以包含限制符和分隔符,如何读取文件?

How to read file when a column can have quotechar and delimiter included in it ?

推荐答案

pandas不允许保持格式不正确的行,说实话我真的没有看到一种方法忽略一些 作为分隔符,然后做一个清理的是你的直觉使用','如果你真的担心这样做在一行:

pandas does not allow you to keep malformed rows and to be honest I don't really see a way of ignoring some " characters but not others in your example. I think your intuition of using '", "' as the delimiter and then doing a cleanup is the best approach. If you're really worried about doing this in one line:

message = pd.read_csv('test.txt', sep='", "', names = ['col1','col2','col3','col4'], skiprows=1).apply(lambda x: x.str.strip('"'))

它还处理列名称中的删除引号,并给出:

which handles stripping quotes in the column names as well and gives you:

>>> message
>>> 
  col1 col2 col3                                     col4
0   v1   v2   v3                                       v4
1  v21  v22  v23  this, "creating, what to do? " problems

这篇关于从pandas读取csv,同时对列值使用quotechar和delimiter的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆