在python中读取德语csv文件的问题 [英] Problems to read german csv file in python

查看:314
本文介绍了在python中读取德语csv文件的问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个德语csv文件,我想用pd.read_csv读取.

I am having a german csv file, which I want to read with pd.read_csv.

数据:

原始文件如下:

因此它有两个列(A,B),并且分隔符应为';'

So it has two Columns (A,B) and the seperator should be ';',

问题: 当我运行命令时:

dataset = pd.read_csv('C:/Users/.../GermanNews/articles.csv',
                      encoding='utf-8', header=None, sep=';')

我得到了错误: ParserError: Error tokenizing data. C error: Expected 2 fields in line 3, saw 3

半解决方案: 我知道这可能有几个原因,但是当我运行命令时:

Half-Solution: I understand this could have several reasons, but when I ran the command:

dataset = pd.read_csv('C:/Users/.../GermanNews/articles.csv',
                      encoding='utf-8', header=None, sep='delimiter')

我获得了以下数据集:

    0
0   Etat;Die ARD-Tochter Degeto hat sich verpflich...
1   Etat;App sei nicht so angenommen worden wie ge...
2   Etat;'Zum Welttag der Suizidprävention ist es ...
3   Etat;Mitarbeiter überreichten Eigentümervertre...
4   Etat;Service: Jobwechsel in der Kommunikations...

所以我只得到一列,而不是两个所需的列,

so I only get one column instead of the two desired columns,

目标: 任何想法如何正确加载我拥有的数据集:

Target: any idea how to load the dataset correctly that I have:

    0       1
0   Etat    Die ARD-Tochter Degeto hat sich verpflich...
1   Etat    App sei nicht so angenommen worden wie ge...

提示/尝试:

当我在excel中对数据运行搜索功能时,我也没有在其中找到任何;.

When I run the search function over my data in excel, I am also not finding any ;in it.

似乎有些行有多于两列(例如,您可以在示例的第3行和第13行中看到

It seems like that some lines have more then two columns (as you can see for example in line 3 and 13 of my example

推荐答案

一种可能的解决方案是创建一个列DataFrame,并用分隔符(不在delimiter之类的数据中)创建分隔符,然后使用

One possible solution is create one column DataFrame with separator not in data like delimiter and then use Series.str.split with n parameter and expand=True for new DataFrame:

dataset = pd.read_csv('C:/Users/.../GermanNews/articles.csv',
                       encoding='utf-8', header=None, sep='delimiter')

#more general solution is use some value NOT exist in data like yen ¥
#dataset = pd.read_csv('C:/Users/.../GermanNews/articles.csv',
#                      encoding='utf-8', header=None, sep='¥')

df = dataset[0].str.split(';', n=1, expand=True)
df.columns = ['A','B']
print (df)

这篇关于在python中读取德语csv文件的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆