排序 pandas 切的垃圾箱 [英] Sort bins from pandas cut

查看:97
本文介绍了排序 pandas 切的垃圾箱的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用熊猫 cut 我可以通过以下方式定义垃圾箱提供边缘和熊猫会创建像(a, b]这样的垃圾箱.

Using pandas cut I can define bins by providing the edges and pandas creates bins like (a, b].

我的问题是如何对垃圾箱进行排序(从最低到最高)?

My question is how can I sort the bins (from the lowest to the highest)?

import numpy as np
import pandas as pd

y = pd.Series(np.random.randn(100))

x1 = pd.Series(np.sign(np.random.randn(100)))
x2 = pd.cut(pd.Series(np.random.randn(100)), bins = [-3, -0.5, 0, 0.5, 3])

model = pd.concat([y, x1, x2], axis = 1, keys = ['Y', 'X1', 'X2'])

我得到一个中间结果,保留了垃圾箱的顺序

I have an intermediate result where the order of the bins is preserved

int_output = model.groupby(['X1', 'X2']).mean().unstack()
int_output.columns = int_output.columns.get_level_values(1)

X2    (-3, -0.5]  (-0.5, 0]  (0, 0.5]  (0.5, 3]
X1                                             
-1.0    0.101475  -0.344419 -0.482992 -0.015179
 1.0    0.249961   0.484757 -0.066383 -0.249414

但是随后我执行了其他操作,可以任意更改垃圾箱的顺序:

But then I do other operations that arbitrarily changes the order of the bins:

output = pd.concat(int_output.to_dict('series'), axis = 1)

      (-0.5, 0]  (-3, -0.5]  (0, 0.5]  (0.5, 3]
X1                                             
-1.0  -0.344419    0.101475 -0.482992 -0.015179
 1.0   0.484757    0.249961 -0.066383 -0.249414

现在,我想在条形图中绘制数据,但我希望将垃圾箱从最低的(-3,-0.5]到最高的(0.5,3]进行排序.

Now I would like to plot the data in a bar chart, but I want the bins to be sorted from the lowest (-3, -0.5] to the highest (0.5, 3].

我想我可以通过以下操作来实现此目的:使用字符串,在,"上进行拆分,然后清洁方括号,但是我想知道是否有更好的方法.

I think I can achieve this by manipulating the string, using a split on "," and then cleaning brackets, but I would like to know if there is a better way.

推荐答案

丢失ordered 一种可能的解决方案是 extract output.columns中的第一个数字,创建帮助器系列并对其进行排序.最后 reindex 原始列:

One possible solution is extract first number from output.columns, create helper Series and sort it. Last reindex original columns:

cat = output.columns.str.extract('\((.*),', expand=False).astype(float)
a = pd.Series(cat, index=output.columns).sort_values()
print (a)
(-3, -0.5]   -3.0
(-0.5, 0]    -0.5
(0, 0.5]      0.0
(0.5, 3]      0.5
dtype: float64

output = output.reindex(columns=a.index)
print (output)
      (-3, -0.5]  (-0.5, 0]  (0, 0.5]  (0.5, 3]
X1                                             
-1.0    0.230060  -0.079266 -0.079834 -0.064455
 1.0   -0.451351   0.268688  0.020091 -0.280218

这篇关于排序 pandas 切的垃圾箱的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆