试图了解 pandas 数据框 [英] Trying to understand pandas dataframes

查看:75
本文介绍了试图了解 pandas 数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个以空格分隔的.dat文件,其前几行如下所示:

I have a space-delimited .dat file, for which the first few lines look like this:

1 SDSSJ000005.95+145310.1 2.49900 * 0.000e+00 0.00 NA -999.000 -999.000 -999.000 -999.000 -999.000 -999.000 -999.000 -999.000 -999.000 0.000 0.000 NONE 
4 SDSSJ000009.27+020621.9 1.43200 UvS 0.000e+00 0.00 NA -999.000 -999.000 -999.000 -999.000 -999.000 -999.000 -999.000 -999.000 -999.000 0.000 0.000 NONE 
5 SDSSJ000009.38+135618.4 2.23900 QSO 0.000e+00 0.00 NA -999.000 -999.000 -999.000 -999.000 -999.000 -999.000 -999.000 -999.000 -999.000 0.000 0.000 NONE 
6 SDSSJ000011.37+150335.7 2.18000 * 0.000e+00 0.00 NA -999.000 -999.000 -999.000 -999.000 -999.000 -999.000 -999.000 -999.000 -999.000 0.000 0.000 NONE 
11 SDSSJ000030.64-064100.0 2.60600 QSO 0.000e+00 0.00 NA -999.000 -999.000 -999.000 -999.000 15.460 -999.000 -999.000 -999.000 -999.000 23.342 56.211 UV 
15 SDSSJ000033.05+114049.6 0.73000 UvS 0.000e+00 0.00 NA -999.000 -999.000 -999.000 -999.000 -999.000 -999.000 -999.000 -999.000 -999.000 0.000 0.000 NONE 
27 LBQS2358+0038 0.95000 QSO 0.000e+00 0.00 NA 17.342 18.483 18.203 17.825 -999.000 -999.000 -999.000 -999.000 -999.000 23.301 56.572 UV 

它们是天文测量值,并且文件中有29008行.当我用

They're astronomical measurements, and there are 29008 lines in the file. When I read the file with

import pandas as pd
data = pd.read_csv('todo.dat', sep = ' ',
                   names = ['no', 'NED', 'z', 'obj_type','S_21', 'power',
                            'SI_flag','U_mag', 'B_mag', 'V_mag', 'R_mag',
                            'K_mag', 'W1_mag', 'W2_mag', 'W3_mag', 'W4_mag',
                            'L_UV', 'Q', 'flag_uv'])

数据框显示[29008 rows x 19 columns].我想根据标题为z的列(这是第三列-索引2)来组织数据.将index_col='z'添加到read_csv调用中会给我一个KeyError: 'z'错误,但是使用index_col = 2不会给我一个错误.我以为熊猫将标题标为字典,因此'z'应该是该列的字典中的键.那么,当我将索引2称为"z"时为什么会出现错误?

the dataframe shows [29008 rows x 19 columns]. I want to organise the data based on the column headed z (which is the third column -- index 2). Adding index_col='z' to the read_csv call gives me a KeyError: 'z' error, but using index_col = 2 doesn't give me an error. I thought pandas labelled the headers like a dictionary, so 'z' should be the key in the dictionary for that column. So why do I get an error when I refer to index 2 as 'z'?

推荐答案

在我看来,这似乎有问题,可能与

To me this seems buggy, and possibly related to this issue. An easy work around is just to use set_index afterhand:

data = pd.read_csv('todo.dat', sep = ' ',
                   names = ['no', 'NED', 'z', 'obj_type','S_21', 'power',
                            'SI_flag','U_mag', 'B_mag', 'V_mag', 'R_mag',
                            'K_mag', 'W1_mag', 'W2_mag', 'W3_mag', 'W4_mag',
                            'L_UV', 'Q', 'flag_uv']).set_index('z')

这篇关于试图了解 pandas 数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆