pandas :需要更快的索引切片方式 [英] Pandas: Need a speedier way of index slicing

查看:188
本文介绍了 pandas :需要更快的索引切片方式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

任何人都要注意加速这个数据帧索引切片方案?
我试图切割和骰子一些巨大的数据帧,所以每一点都重要。我需要以某种方式找到一个更快的索引切片数据框的方法,不同于以下技术:

  v = initFrame.xs x,level =('ifoo2','ifoo3'),drop_level = False)

在pd.unique中影响性能相当显着。

  uniqueList = list(pd.unique(initFrame [['bar1',' bar]])

复制并粘贴以下代码段以避免设置。 >

  import pandas as pd 

foo1 =(['LABEL1','LABEL1','LABEL2',' LABEL2'])
foo2 =([5,5,6,6])
foo3 =([1,1,2,3])

index = pd。 MultiIndex.from_arrays([foo1,foo2,foo3],names = ['ifoo1','ifoo2','ifoo3'])

initFrame = pd.DataFrame({'bar1' 6,5,6],
'bar2':['a','b','c','d'],
'bar3':[11,22,33,44] ,
' bar4':[1,2,1,3]},index = index)

finDict = {}
#start timer1
uniqueList = list(pd.unique(initFrame []'
#end timer1
for x在uniqueList中:
#start timer2
v = initFrame.xs(x,level =('ifoo2','ifoo3'),drop_level = False)
#stop timer2
k = int(x [0]),int(x [1])$ ​​b $ b finDict.update {k:v})



更新2016-04-04



对于那些感兴趣的人,我最终使用以下内容:

  finDict = {} 
石斑鱼= initFrame.groupby(level =('ifoo2','ifoo3'))
名称,grouper中的组:
finDict.update({name:group})


解决方案

您可以使用字典理解与 loc 做数据框索引:

  finDict = {pair:df.loc [pd.IndexSlice [:, pair [ 0],pair [1]],:] 
for pd.unique(initFrame [[ 'bar1','bar4']]。值).tolist()}

>>> finDict
{(5,1)):bar1 bar2 bar3 bar4
ifoo1 ifoo2 ifoo3
LABEL1 5 1 5 a 11 1
1 6 b 22 2,
(6 ,2):bar1 bar2 bar3 bar4
ifoo1 ifoo2 ifoo3
LABEL2 6 2 5 c 33 1,
(6,3):bar1 bar2 bar3 bar4
ifoo1 ifoo2 ifoo3
LABEL2 6 3 6 d 44 3}


Anyone care to take a stab at speeding up this dataframe index slicing scheme? I'm trying to slice and dice some huge dataframes, so every bit counts. I need to somehow find a faster way of index slicing the dataframe, other than the following technique:

v = initFrame.xs(x,level=('ifoo2','ifoo3'), drop_level=False) 

Also the loop in pd.unique is impacting performance pretty significantly.

uniqueList = list(pd.unique(initFrame[['bar1','bar4']].values))

Copy and paste the below snippet to avoid setup.

import pandas as pd

foo1 = (['LABEL1','LABEL1','LABEL2','LABEL2'])
foo2 = ([5,5,6,6])
foo3 = ([1,1,2,3])

index = pd.MultiIndex.from_arrays([foo1,foo2,foo3], names=['ifoo1','ifoo2','ifoo3'])

initFrame = pd.DataFrame({'bar1': [ 5,6,5,6],
                          'bar2': ['a','b','c','d'],
                          'bar3': [11,22,33,44],
                          'bar4': [1,2,1,3]}, index=index)

finDict = {}
#start timer1
uniqueList = list(pd.unique(initFrame[['bar1','bar4']].values))
#end timer1
for x in uniqueList:
    #start timer2
    v = initFrame.xs(x,level=('ifoo2','ifoo3'), drop_level=False)
    #stop timer2
    k = int(x[0]), int(x[1])  
    finDict.update({k:v})

UPDATE 2016-04-04

For those interested, I ended up using the following:

finDict = {}
grouper = initFrame.groupby(level=('ifoo2', 'ifoo3'))
for name, group in grouper:
    finDict.update({name:group})

解决方案

You can use a dictionary comprehension together with loc to do the dataframe indexing:

finDict = {pair: df.loc[pd.IndexSlice[:, pair[0], pair[1]], :] 
           for pair in pd.unique(initFrame[['bar1', 'bar4']].values).tolist()}

>>> finDict
{(5, 1):                     bar1 bar2  bar3  bar4
 ifoo1  ifoo2 ifoo3                       
 LABEL1 5     1         5    a    11     1
              1         6    b    22     2,
 (6, 2):                     bar1 bar2  bar3  bar4
 ifoo1  ifoo2 ifoo3                       
 LABEL2 6     2         5    c    33     1,
 (6, 3):                     bar1 bar2  bar3  bar4
 ifoo1  ifoo2 ifoo3                       
 LABEL2 6     3         6    d    44     3}

这篇关于 pandas :需要更快的索引切片方式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆