在DataFrame上选择多个横截面的正确方法 [英] The right way to select multiple cross-sections on a DataFrame

查看：128 发布时间：2020/5/24 2:36:25 pandas

本文介绍了在DataFrame上选择多个横截面的正确方法的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个MultiIndex DataFrame，可以在上面选择有趣的横截面.该代码可以工作，但是在大型数据集上运行缓慢，这使我觉得我做错了什么.本质上，我已经将多个横截面连接到一个新的DataFrame中，并且我正在寻找一种更好的方法.

I have a MultiIndex DataFrame on which I am selecting interesting cross-sections. The code works, but is slow on large datasets which makes me think I'm doing something wrong. Essentially I have been concatenating multiple cross-sections into a new DataFrame, and I am looking for a better way.

import pandas as pd
import numpy as np
import itertools

# setup dataset
event = ['event0', 'event1', 'event2']
node = ['n0', 'n1', 'n2', 'n3']
config = ['a', 'b']
data = []
for x in itertools.product(*[event, node, config]):
    data.append([x[0], x[1], x[2], np.random.randn()])
df = pd.DataFrame(data, columns=['event', 'node', 'config', 'value'])
dfi = df.set_index(['event', 'node'])
print dfi.head(n=12)

如下所示:

            config     value
event  node
event0 n0        a  1.256259
       n0        b  0.612465
       n1        a  1.593518
       n1        b -0.747131
       n2        a  0.719973
       n2        b  1.063480
       n3        a -0.943120
       n3        b  2.021804
event1 n0        a -1.427104
       n0        b -0.440886
       n1        a  0.168212
       n1        b -1.084987

一些分析

我进行了一些分析，得出了我关心的索引列表:

Some Analysis

I do some analysis which gives me a list of indexes that I care about:

# Find interesting (event,node) 
g = df.groupby(['event', 'node'])['value']
gmin = g.min()
idxs = gmin[(gmin<-1.2)].index
print idxs
#idxs = [(u'event1', u'n0'), (u'event1', u'n2'), (u'event2', u'n0')]

以及笨拙的横截面

现在，我只关心有趣的事件，节点组合.这是在真实数据集上较慢的部分.每个.xs可能需要100毫秒，但它们的总和为:

And the clumsy cross-sections

Now I just care about the interesting event, node combinations. This is the part which is slow on real data sets. Each .xs might take 100ms, but they add up:

df2 = pd.concat([dfi.xs(idx) for idx in idxs]) 
print df2

哪个给出了有趣(事件，节点)横截面的每种配置的值:

Which gives the value for every configuration of the interesting (event, node) cross section:

            config     value
event  node
event1 n0        a -1.427104
       n0        b -0.440886
       n2        a  0.273871
       n2        b -1.224801
event2 n0        a -1.297496
       n0        b -1.087568

参考文献

类似的问题建议控制板.我无法找出合适的索引来完成这项工作.

References

A similar question recommends a Panel. I have not been able to figure out the right indexes to make this work.

推荐答案

使用

You'll be much better off using groupby's filter method (new in 0.12!), which was designed for exactly this purpose:

In [11]: g = df.groupby(['event', 'node'])

In [12]: g.filter(lambda x: x['value'].min() < -1.2)
Out[12]: 
     event node config     value
0   event0   n0      a -1.566442
1   event0   n0      b -1.652915
14  event1   n3      a  1.685070
15  event1   n3      b -3.205499
20  event2   n2      a -3.007079
21  event2   n2      b  0.159409

(我的数字是不同的，因为它们是随机生成的！)

然后您可以将索引设置为事件，并将节点设置为得到您想要的结果:

You can then set the index to event and node to get your desired result:

In [13]: g.filter(lambda x: x['value'].min() < - 1.2).set_index(['event', 'node'])
Out[13]: 
            config     value
event  node                 
event0 n0        a -1.566442
       n0        b -1.652915
event1 n3        a  1.685070
       n3        b -3.205499
event2 n2        a -3.007079
       n2        b  0.159409

这篇关于在DataFrame上选择多个横截面的正确方法的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在DataFrame上选择多个横截面的正确方法 [英] The right way to select multiple cross-sections on a DataFrame

问题描述

一些分析

Some Analysis

以及笨拙的横截面

And the clumsy cross-sections

参考文献

References

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在DataFrame上选择多个横截面的正确方法 [英] The right way to select multiple cross-sections on a DataFrame

问题描述

一些分析

Some Analysis

以及笨拙的横截面

And the clumsy cross-sections

参考文献

References

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭