pandas read_csv不服从正则表达式sep [英] Pandas read_csv not obeying a regex sep

查看:368
本文介绍了 pandas read_csv不服从正则表达式sep的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

数据:

from io import StringIO
import pandas as pd

s = '''ID,Level,QID,Text,ResponseID,responseText,date_key,last
375280046,S,D3M,Which is your favorite?,D5M0,option 1,2012-08-08 00:00:00,ynot
375280046,S,D3M,How often? (at home, at work, other),D3M0,Work,2010-03-31 00:00:00,okkk
375280046,M,A78,Do you prefer a, b, or c?,A78C,a,2010-03-31 00:00:00,abc
376918925,M,A78,Which ONE (select only one),A78E,Milk,2004-02-02 00:00:00,launch Wed., '''

df = pd.read_csv(StringIO(s), sep=r',(?!\s)')

问题:我问了一个问题

Problem: I asked a question here. I ran into a new problem though. Notice at the end of the last line, it's a comma and a space. The regex in sep=r',(?!\s)' is supposed to ignore commas that are followed by a space.

问题:是否有一种方法可以按字面意义launch Wed.,读取最后一列,其中逗号不是分隔符/定界符,而实际上是last列文本中的逗号-使用仅pd.read_csv?

Question: Is there a way to read the last column as literally launch Wed., where the comma isn't a separator/delimiter but is literally a comma in the last column text - using pd.read_csv only?

错误:

ValueError: Expected 8 fields in line 5, saw 9. Error could possibly be due to quotes being ignored when a multi-char delimiter is used.

预期/期望的输出:

          ID Level  QID                                  Text ResponseID  \
0  375280046     S  D3M               Which is your favorite?       D5M0   
1  375280046     S  D3M  How often? (at home, at work, other)       D3M0   
2  375280046     M  A78             Do you prefer a, b, or c?       A78C   
3  376918925     M  A78           Which ONE (select only one)       A78E   

  responseText             date_key           last  
0     option 1  2012-08-08 00:00:00           ynot  
1         Work  2010-03-31 00:00:00           okkk  
2            a  2010-03-31 00:00:00            abc  
3         Milk  2004-02-02 00:00:00  launch Wed.,   

推荐答案

让我们看看这个使用此正则表达式,如上文所述r',(?=\S)'.

Use this regular expression, r',(?=\S)' explained above.

from io import StringIO
import pandas as pd

s = '''ID,Level,QID,Text,ResponseID,responseText,date_key,last
375280046,S,D3M,Which is your favorite?,D5M0,option 1,2012-08-08 00:00:00,ynot
375280046,S,D3M,How often? (at home, at work, other),D3M0,Work,2010-03-31 00:00:00,okkk
375280046,M,A78,Do you prefer a, b, or c?,A78C,a,2010-03-31 00:00:00,abc
376918925,M,A78,Which ONE (select only one),A78E,Milk,2004-02-02 00:00:00,launch Wed., '''

df = pd.read_csv(StringIO(s), sep=r',(?=\S)')

输出:

              ID                                 Level   QID      Text  \
375280046 S  D3M               Which is your favorite?  D5M0  option 1   
          S  D3M  How often? (at home, at work, other)  D3M0      Work   
          M  A78             Do you prefer a, b, or c?  A78C         a   
376918925 M  A78           Which ONE (select only one)  A78E      Milk   

                ResponseID  responseText  date_key          last  
375280046 S  2012-08-08 00             0         0          ynot  
          S  2010-03-31 00             0         0          okkk  
          M  2010-03-31 00             0         0           abc  
376918925 M  2004-02-02 00             0         0  launch Wed.,  

这篇关于 pandas read_csv不服从正则表达式sep的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆