pandas read_csv()用于多个定界符 [英] pandas read_csv() for multiple delimiters

查看:147
本文介绍了 pandas read_csv()用于多个定界符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个文件,该文件的数据如下

I have a file which has data as follows

1000000 183:0.6673;2:0.3535;359:0.304;363:0.1835
1000001 92:1.0
1000002 112:1.0
1000003 154435:0.746;30:0.3902;220:0.2803;238:0.2781;232:0.2717
1000004 118:1.0
1000005 157:0.484;25:0.4383;198:0.3033
1000006 277:0.7815;1980:0.4825;146:0.175
1000007 4069:0.6678;2557:0.6104;137:0.4261
1000009 2:1.0

我想将文件读取到以多个分隔符\t, :, ;

I want to read the file to a pandas dataframe seperated by the multiple delimeters \t, :, ;

我尝试了

df_user_key_word_org = pd.read_csv(filepath+"user_key_word.txt", sep='\t|:|;', header=None, engine='python')

它给了我以下错误.

pandas.errors.ParserError: Error could be due to quotes being ignored when a multi-char delimiter is used.

为什么会出现此错误?

所以我想我将尝试使用正则表达式字符串.但是我不确定如何编写拆分正则表达式. r'\ t |:|;'不起作用.

So I thought I'll try to use the regex string. But I am not sure how to write a split regex. r'\t|:|;' doesn't work.

将文件读取到具有多个定界符的熊猫数据帧的最佳方法是什么?

What is the best way to read a file to a pandas data frame with multiple delimiters?

推荐答案

从该问题开始,

From this question, Handling Variable Number of Columns with Pandas - Python, one workaround to pandas.errors.ParserError: Expected 29 fields in line 11, saw 45. is let read_csv know about how many rows in advance.

my_cols = [str(i) for i in range(45)] # create some row names
df_user_key_word_org = pd.read_csv(filepath+"user_key_word.txt",
                                   sep="\s+|;|:",
                                   names=my_cols, 
                                   header=None, 
                                   engine="python")
# I tested with s = StringIO(text_from_OP) on my computer

希望这行得通.

这篇关于 pandas read_csv()用于多个定界符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆