读CSV时要加上多余的逗号,而Pandas不能读取quotechar? [英] Read CSV with extra commas and no quotechar with Pandas?

查看:318
本文介绍了读CSV时要加上多余的逗号,而Pandas不能读取quotechar?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

数据:

from io import StringIO
import pandas as pd

s = '''ID,Level,QID,Text,ResponseID,responseText,date_key
375280046,S,D3M,Which is your favorite?,D5M0,option 1,2012-08-08 00:00:00
375280046,S,D3M,How often? (at home, at work, other),D3M0,Work,2010-03-31 00:00:00
375280046,M,A78,Do you prefer a, b, or c?,A78C,a,2010-03-31 00:00:00'''

df = pd.read_csv(StringIO(s))

收到错误:

pandas.io.common.CParserError: Error tokenizing data. C error: Expected 7 fields in line 3, saw 9

很明显为什么我会收到此错误.数据包含How often? (at home, at work, other)Do you prefer a, b, or c?之类的文本.

It's very obvious why I'm receiving this error. The data contains text such as How often? (at home, at work, other) and Do you prefer a, b, or c?.

人们如何将这种类型的数据读入pandas DataFrame?

How does one read this type of data into a pandas DataFrame?

推荐答案

当然,在我写问题时,我已经弄清楚了.忘记删除后,我会与其分享,而不是删除它.

Of course, as I write the question, I figured it out. Rather than delete it, I'll share it with my future self when I forget how to do this.

显然,熊猫默认sep=','也可以是正则表达式.

Apparently, pandas default sep=',' can also be a regular expression.

解决方案是像这样将sep=r',(?!\s)'添加到read_csv:

The solution was to add sep=r',(?!\s)' to read_csv like so:

df = pd.read_csv(StringIO(s), sep=r',(?!\s)')

(?!\s)部分是负向超前,用于仅匹配在其后没有空格的逗号.

The (?!\s) part is a negative lookahead to match only commas that don't have a following space after them.

结果:

          ID Level  QID                                  Text ResponseID  \
0  375280046     S  D3M               Which is your favorite?       D5M0   
1  375280046     S  D3M  How often? (at home, at work, other)       D3M0   
2  375280046     M  A78             Do you prefer a, b, or c?       A78C   

  responseText             date_key  
0     option 1  2012-08-08 00:00:00  
1         Work  2010-03-31 00:00:00  
2            a  2010-03-31 00:00:00  

这篇关于读CSV时要加上多余的逗号,而Pandas不能读取quotechar?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆