pandas :read_csv表示“以空格分隔" [英] Pandas: read_csv indicating 'space-delimited'

查看:302
本文介绍了 pandas :read_csv表示“以空格分隔"的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下file.txt(摘要):

I have the following file.txt (abridged):

SICcode        Catcode        Category                              SICname        MultSIC
0111        A1500        Wheat, corn, soybeans and cash grain        Wheat        X
0112        A1600        Other commodities (incl rice, peanuts)      Rice        X
0115        A1500        Wheat, corn, soybeans and cash grain        Corn        X
0116        A1500        Wheat, corn, soybeans and cash grain        Soybeans        X
0119        A1500        Wheat, corn, soybeans and cash grain        Cash grains, NEC        X
0131        A1100        Cotton        Cotton        X
0132        A1300        Tobacco & Tobacco products                  Tobacco        X

将其读入pandas df时遇到一些问题.我尝试使用以下规范 engine ='python',sep ='Tab' pd.read_csv ,但它在一列中返回了文件:

I'm having some problems reading it into a pandas df. I tried pd.read_csvwith the following specifications engine='python', sep='Tab'but it returned the file in one column:

    SICcode Catcode Category SICname MultSIC
0   0111 A1500 Wheat, corn, soybeans...
1   0112 A1600 Other commodities (in...
2   0115 A1500 Wheat, corn, soybeans...
3   0116 A1500 Wheat, corn, soybeans...

然后,我尝试使用'tab'作为分隔符将其放入一个数字文件,但它将文件读为一列.有人对此有想法吗?

Then I tried to put it into a gnumeric file using 'tab' as a delimiter, but it read the file as one column. Does anyone have an idea on this?

推荐答案

如果 df = pd.read_csv('file.txt',sep ='\ t')返回带有一列的DataFrame,则显然 file.txt 并未使用制表符作为分隔符.您的数据可能仅使用空格作为分隔符.在这种情况下,您可以尝试

If df = pd.read_csv('file.txt', sep='\t') returns a DataFrame with one column, then apparently file.txt is not using tabs as separators. Your data might simply have spaces as separators. In that case you could try

df = pd.read_csv('data', sep=r'\s{2,}')

使用正则表达式模式 \ s {2,} 作为分隔符.此正则表达式匹配2个或多个空格字符.

which uses the regex pattern \s{2,} as the separator. This regex matches 2-or-more whitespace characters.

In [8]: df
Out[8]: 
   SICcode Catcode                                Category           SICname  \
0      111   A1500    Wheat, corn, soybeans and cash grain             Wheat   
1      112   A1600  Other commodities (incl rice, peanuts)              Rice   
2      115   A1500    Wheat, corn, soybeans and cash grain              Corn   
3      116   A1500    Wheat, corn, soybeans and cash grain          Soybeans   
4      119   A1500    Wheat, corn, soybeans and cash grain  Cash grains, NEC   
5      131   A1100                                  Cotton            Cotton   
6      132   A1300              Tobacco & Tobacco products           Tobacco   

  MultSIC  
0       X  
1       X  
2       X  
3       X  
4       X  
5       X  
6       X  

如果这不起作用,请发布 print(repr(open(file.txt,'rb').read(100)).这将向我们清晰显示前100个 file.txt 的字节.

If this does not work, please post print(repr(open(file.txt, 'rb').read(100)). This will show us an unambiguous representation of the first 100 bytes of file.txt.

这篇关于 pandas :read_csv表示“以空格分隔"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆