pandas read_csv不服从正则表达式sep [英] Pandas read_csv not obeying a regex sep
本文介绍了 pandas read_csv不服从正则表达式sep的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
数据:
from io import StringIO
import pandas as pd
s = '''ID,Level,QID,Text,ResponseID,responseText,date_key,last
375280046,S,D3M,Which is your favorite?,D5M0,option 1,2012-08-08 00:00:00,ynot
375280046,S,D3M,How often? (at home, at work, other),D3M0,Work,2010-03-31 00:00:00,okkk
375280046,M,A78,Do you prefer a, b, or c?,A78C,a,2010-03-31 00:00:00,abc
376918925,M,A78,Which ONE (select only one),A78E,Milk,2004-02-02 00:00:00,launch Wed., '''
df = pd.read_csv(StringIO(s), sep=r',(?!\s)')
Problem: I asked a question here. I ran into a new problem though. Notice at the end of the last line, it's a comma and a space. The regex in sep=r',(?!\s)'
is supposed to ignore commas that are followed by a space.
问题:是否有一种方法可以按字面意义launch Wed.,
读取最后一列,其中逗号不是分隔符/定界符,而实际上是last
列文本中的逗号-使用仅pd.read_csv
?
Question: Is there a way to read the last column as literally launch Wed.,
where the comma isn't a separator/delimiter but is literally a comma in the last
column text - using pd.read_csv
only?
错误:
ValueError: Expected 8 fields in line 5, saw 9. Error could possibly be due to quotes being ignored when a multi-char delimiter is used.
预期/期望的输出:
ID Level QID Text ResponseID \
0 375280046 S D3M Which is your favorite? D5M0
1 375280046 S D3M How often? (at home, at work, other) D3M0
2 375280046 M A78 Do you prefer a, b, or c? A78C
3 376918925 M A78 Which ONE (select only one) A78E
responseText date_key last
0 option 1 2012-08-08 00:00:00 ynot
1 Work 2010-03-31 00:00:00 okkk
2 a 2010-03-31 00:00:00 abc
3 Milk 2004-02-02 00:00:00 launch Wed.,
推荐答案
让我们看看这个使用此正则表达式,如上文所述r',(?=\S)'
.
Use this regular expression, r',(?=\S)'
explained above.
from io import StringIO
import pandas as pd
s = '''ID,Level,QID,Text,ResponseID,responseText,date_key,last
375280046,S,D3M,Which is your favorite?,D5M0,option 1,2012-08-08 00:00:00,ynot
375280046,S,D3M,How often? (at home, at work, other),D3M0,Work,2010-03-31 00:00:00,okkk
375280046,M,A78,Do you prefer a, b, or c?,A78C,a,2010-03-31 00:00:00,abc
376918925,M,A78,Which ONE (select only one),A78E,Milk,2004-02-02 00:00:00,launch Wed., '''
df = pd.read_csv(StringIO(s), sep=r',(?=\S)')
输出:
ID Level QID Text \
375280046 S D3M Which is your favorite? D5M0 option 1
S D3M How often? (at home, at work, other) D3M0 Work
M A78 Do you prefer a, b, or c? A78C a
376918925 M A78 Which ONE (select only one) A78E Milk
ResponseID responseText date_key last
375280046 S 2012-08-08 00 0 0 ynot
S 2010-03-31 00 0 0 okkk
M 2010-03-31 00 0 0 abc
376918925 M 2004-02-02 00 0 0 launch Wed.,
这篇关于 pandas read_csv不服从正则表达式sep的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文