pandas ：需要更快的索引切片方式 [英] Pandas: Need a speedier way of index slicing

查看：188 发布时间：2017/3/26 3:42:50 python pandas dataframe slice

本文介绍了 pandas ：需要更快的索引切片方式的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

任何人都要注意加速这个数据帧索引切片方案？
我试图切割和骰子一些巨大的数据帧，所以每一点都重要。我需要以某种方式找到一个更快的索引切片数据框的方法，不同于以下技术：

  v = initFrame.xs x，level =（'ifoo2'，'ifoo3'），drop_level = False）

在pd.unique中影响性能相当显着。

  uniqueList = list（pd.unique（initFrame [['bar1'，' bar]]）

复制并粘贴以下代码段以避免设置。 >

  import pandas as pd 
 
 foo1 =（['LABEL1'，'LABEL1'，'LABEL2'，' LABEL2']）
 foo2 =（[5,5,6,6]）
 foo3 =（[1,1,2,3]）
 
 index = pd。 MultiIndex.from_arrays（[foo1，foo2，foo3]，names = ['ifoo1'，'ifoo2'，'ifoo3']）
 
 initFrame = pd.DataFrame（{'bar1' 6,5,6]，
'bar2'：['a'，'b'，'c'，'d']，
'bar3'：[11,22,33,44] ，
' bar4'：[1,2,1,3]}，index = index）
 
 finDict = {} 
 #start timer1 
 uniqueList = list（pd.unique（initFrame []'
 #end timer1 
 for x在uniqueList中：
 #start timer2 
v = initFrame.xs（x，level =（'ifoo2'，'ifoo3'），drop_level = False）
 #stop timer2 
k = int（x [0]），int（x [1]）$ b $ b finDict.update {k：v}）

更新2016-04-04

对于那些感兴趣的人，我最终使用以下内容：

  finDict = {} 
石斑鱼= initFrame.groupby（level =（'ifoo2'，'ifoo3'））
名称，grouper中的组：
 finDict.update（{name：group}）

解决方案

您可以使用字典理解与 loc 做数据框索引：

  finDict = {pair：df.loc [pd.IndexSlice [:, pair [ 0]，pair [1]]，：] 
 for pd.unique（initFrame [[ 'bar1'，'bar4']]。值）.tolist（）} 
 
>>> finDict 
 {（5，1））：bar1 bar2 bar3 bar4 
 ifoo1 ifoo2 ifoo3 
 LABEL1 5 1 5 a 11 1 
 1 6 b 22 2，
（6 ，2）：bar1 bar2 bar3 bar4 
 ifoo1 ifoo2 ifoo3 
 LABEL2 6 2 5 c 33 1，
（6，3）：bar1 bar2 bar3 bar4 
 ifoo1 ifoo2 ifoo3 
 LABEL2 6 3 6 d 44 3}

Anyone care to take a stab at speeding up this dataframe index slicing scheme? I'm trying to slice and dice some huge dataframes, so every bit counts. I need to somehow find a faster way of index slicing the dataframe, other than the following technique:

v = initFrame.xs(x,level=('ifoo2','ifoo3'), drop_level=False)

Also the loop in pd.unique is impacting performance pretty significantly.

uniqueList = list(pd.unique(initFrame[['bar1','bar4']].values))

Copy and paste the below snippet to avoid setup.

import pandas as pd

foo1 = (['LABEL1','LABEL1','LABEL2','LABEL2'])
foo2 = ([5,5,6,6])
foo3 = ([1,1,2,3])

index = pd.MultiIndex.from_arrays([foo1,foo2,foo3], names=['ifoo1','ifoo2','ifoo3'])

initFrame = pd.DataFrame({'bar1': [ 5,6,5,6],
                          'bar2': ['a','b','c','d'],
                          'bar3': [11,22,33,44],
                          'bar4': [1,2,1,3]}, index=index)

finDict = {}
#start timer1
uniqueList = list(pd.unique(initFrame[['bar1','bar4']].values))
#end timer1
for x in uniqueList:
    #start timer2
    v = initFrame.xs(x,level=('ifoo2','ifoo3'), drop_level=False)
    #stop timer2
    k = int(x[0]), int(x[1])  
    finDict.update({k:v})

UPDATE 2016-04-04

For those interested, I ended up using the following:

finDict = {}
grouper = initFrame.groupby(level=('ifoo2', 'ifoo3'))
for name, group in grouper:
    finDict.update({name:group})

解决方案

You can use a dictionary comprehension together with loc to do the dataframe indexing:

finDict = {pair: df.loc[pd.IndexSlice[:, pair[0], pair[1]], :] 
           for pair in pd.unique(initFrame[['bar1', 'bar4']].values).tolist()}

>>> finDict
{(5, 1):                     bar1 bar2  bar3  bar4
 ifoo1  ifoo2 ifoo3                       
 LABEL1 5     1         5    a    11     1
              1         6    b    22     2,
 (6, 2):                     bar1 bar2  bar3  bar4
 ifoo1  ifoo2 ifoo3                       
 LABEL2 6     2         5    c    33     1,
 (6, 3):                     bar1 bar2  bar3  bar4
 ifoo1  ifoo2 ifoo3                       
 LABEL2 6     3         6    d    44     3}

这篇关于 pandas ：需要更快的索引切片方式的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

pandas ：需要更快的索引切片方式 [英] Pandas: Need a speedier way of index slicing

问题描述

更新2016-04-04

UPDATE 2016-04-04

相关文章

Python最新文章

热门教程

热门工具

登录关闭

pandas ：需要更快的索引切片方式 [英] Pandas: Need a speedier way of index slicing

问题描述

更新2016-04-04

UPDATE 2016-04-04

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭