python pandas read_csv分隔符在列数据中 [英] python pandas read_csv delimiter in column data
问题描述
我有这种类型的CSV档案:
I'm having this type of CSV file:
12012;My Name is Mike. What is your's?;3;0
1522;In my opinion: It's cool; or at least not bad;4;0
21427;Hello. I like this feature!;5;1
c> pandas.DataFrame 。
但是 read_csv(sep =;)
由于第2行中用户生成的消息列中的分号抛出异常(在我看来:至少不坏)。所有剩余的列不断有数字dtypes。
I want to get this data into da pandas.DataFrame
.
But read_csv(sep=";")
throws exceptions due to the semicolon in the user generated message column in line 2 (In my opinion: It's cool; or at least not bad). All remaining columns constantly have numeric dtypes.
最方便的方法是什么?
What is the most convenient method to manage this?
推荐答案
处理无引号分隔符总是一个麻烦。在这种情况下,由于它看起来像破碎的文本被三个正确编码的列包围,我们可以恢复。 TBH,我只使用标准的Python阅读器,并从中构建一个DataFrame:
Dealing with unquoted delimiters is always a nuisance. In this case, since it looks like the broken text is known to be surrounded by three correctly-encoded columns, we can recover. TBH, I'd just use the standard Python reader and build a DataFrame once from that:
import csv
import pandas as pd
with open("semi.dat", "r", newline="") as fp:
reader = csv.reader(fp, delimiter=";")
rows = [x[:1] + [';'.join(x[1:-2])] + x[-2:] for x in reader]
df = pd.DataFrame(rows)
产生
0 1 2 3
0 12012 My Name is Mike. What is your's? 3 0
1 1522 In my opinion: It's cool; or at least not bad 4 0
2 21427 Hello. I like this feature! 5 1
然后我们可以立即保存并正确引用:
Then we can immediately save it and get something quoted correctly:
In [67]: df.to_csv("fixedsemi.dat", sep=";", header=None, index=False)
In [68]: more fixedsemi.dat
12012;My Name is Mike. What is your's?;3;0
1522;"In my opinion: It's cool; or at least not bad";4;0
21427;Hello. I like this feature!;5;1
In [69]: df2 = pd.read_csv("fixedsemi.dat", sep=";", header=None)
In [70]: df2
Out[70]:
0 1 2 3
0 12012 My Name is Mike. What is your's? 3 0
1 1522 In my opinion: It's cool; or at least not bad 4 0
2 21427 Hello. I like this feature! 5 1
这篇关于python pandas read_csv分隔符在列数据中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!