关键错误和MultiIndex lexsort深度 [英] key error and MultiIndex lexsort depth

查看:470
本文介绍了关键错误和MultiIndex lexsort深度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一组制表符分隔的文件,我必须阅读这些文件,将它们用作pandas数据框,对它们进行一堆操作,然后将它们合并回一个excel文件,代码太长,因此我将经历其中有问题的部分

I have a set of tab delimited files that I have to go through read them, use them as pandas dataframe, do a whole bunch of operations on them and then merge them back to one excel file, the code is too long so I am going to go through the problematic part of it

我正在解析的标签文件包含所有相同数量的行2177

The tab files that I am parsing contains all the same number of rows 2177

当我阅读这些文件时,我将按类型(字符串,整数)的前两列进行索引

When I read these files I am indexing by the first 2 columns of type (string, int)

df = df.set_index(['id', 'coord'])
data = OrderedDict()
#data will contain all the information I am writing to excel
data[filename_id] = df

我正在执行的过程之一需要访问data [sample_id]的每一行,该行包含混合类型的数据帧,这些数据类型以"id"和"coord"列为索引,就像这样

one of the procedures I am doing needs access to each row of data[sample_id] which contains dataframe of mixed types indexed with the columns 'id' and 'coord', like this

sample_row = data[sample].ix[index]

我的索引为('id','coord')

my index being ('id','coord')

如果我要处理文件的子集,那么一切都很好,但是如果我用2177行读取了整个文件,则最终会收到此错误消息

If I am treating a subset of the file everything works great, but If I read the entire files with 2177 lines I end up having this error message

KeyError: 'Key length (2) was greater than MultiIndex lexsort depth (0)'

我搜索了SO的所有位置,似乎这是对索引进行排序的问题,但是我不明白为什么使用未排序的子集不会导致问题

I searched over SO and everywhere and it seems that this is an issue of sorting the index, but I dont understand why using an unsorted subset do not cause the problem

关于如何解决这个问题的任何想法吗?

Any idea on how I can get this sorted out ?

谢谢

推荐答案

文档非常好.如果您使用多重索引,则需要多次阅读它们,这是值得的,请参阅

Docs are quite good. If you work with multi-indexes it pays to read them thru (several times!), see here

In [9]: df = DataFrame(np.arange(9).reshape(-1,1),columns=['value'],index=pd.MultiIndex.from_product([[1,2,3],['a','b','c']],names=['one','two']))

In [10]: df
Out[10]: 
         value
one two       
1   a        0
    b        1
    c        2
2   a        3
    b        4
    c        5
3   a        6
    b        7
    c        8

In [11]: df.index.lexsort_depth
Out[11]: 2

In [12]: df.sortlevel(level=1)
Out[12]: 
         value
one two       
1   a        0
2   a        3
3   a        6
1   b        1
2   b        4
3   b        7
1   c        2
2   c        5
3   c        8

In [13]: df.sortlevel(level=1).index.lexsort_depth
Out[13]: 0

In [9]: df = DataFrame(np.arange(9).reshape(-1,1),columns=['value'],index=pd.MultiIndex.from_product([[1,2,3],['a','b','c']],names=['one','two']))

In [10]: df
Out[10]: 
         value
one two       
1   a        0
    b        1
    c        2
2   a        3
    b        4
    c        5
3   a        6
    b        7
    c        8

In [11]: df.index.lexsort_depth
Out[11]: 2

In [12]: df.sortlevel(level=1)
Out[12]: 
         value
one two       
1   a        0
2   a        3
3   a        6
1   b        1
2   b        4
3   b        7
1   c        2
2   c        5
3   c        8

In [13]: df.sortlevel(level=1).index.lexsort_depth
Out[13]: 0

更新:

sortlevel将被弃用,因此请使用sort_index

sortlevel will be deprecated so use sort_index i.e

df.sort_index(level=1)

这篇关于关键错误和MultiIndex lexsort深度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆