按范围对数据进行分组时如何更改bin大小? [英] How to change bin size when grouping data by ranges?
本文介绍了按范围对数据进行分组时如何更改bin大小?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我的问题与另一个问题的解决方案有关.
我想知道如何将垃圾箱大小从3更改为5或10或其他任何内容.如果我更改step
,那还不够.我还应该更改(str(int(cat[1:3])) + "-" + str(int(cat[5:7])-1)
,但这是我不能做的.我收到错误ValueError: invalid literal for int() with base 10: '18, '
.
My question is related to the solution of the other question.
I wonder how can I change the bin size from 3 to 5 or 10 or whatever. If I change step
then it's not enough. I should also change (str(int(cat[1:3])) + "-" + str(int(cat[5:7])-1)
, but this is what I cannot do. I get the error ValueError: invalid literal for int() with base 10: '18, '
.
step=3
kwargs = dict(include_lowest=True, right=False)
bins = pd.cut(df.AVG_PERCENT_EVAL_1, bins=np.arange(18,40+step,step), **kwargs)
labels = [(str(int(cat[1:3])) + "-" + str(int(cat[5:7])-1)) for cat in bins.cat.categories]
bins.cat.categories = labels
df = df.assign(AVG_PERCENT_RANGE=bins).drop("AVG_PERCENT_EVAL_1", axis=1)
df.groupby(['GROUP', 'AVG_PERCENT_RANGE'], as_index=False).agg('mean')
推荐答案
这是您想要的吗?
In [166]: %paste
step=5
kwargs = dict(include_lowest=True, right=False)
bins=np.arange(18,40+step,step)
labels = ['{}-{}'.format(i, i+step-1) for i in bins][:-1]
df['AVG_PERCENT_RANGE'] = pd.cut(df.pop('AVG_PERCENT_EVAL_1'),
bins=bins, labels=labels, **kwargs)
df.groupby(['GROUP', 'AVG_PERCENT_RANGE'], as_index=False).agg('mean')
## -- End pasted text --
Out[166]:
GROUP AVG_PERCENT_RANGE AVG_PERCENT_NEGATIVE AVG_TOTAL_WAIT_TIME AVG_TOTAL_SERVICE_TIME
0 AAAAA 18-22 6.500000 85.682099 247.880659
1 AAAAA 23-27 0.833333 103.445112 314.336474
2 AAAAA 28-32 NaN NaN NaN
3 AAAAA 33-37 NaN NaN NaN
4 AAAAA 38-42 NaN NaN NaN
5 BBBBB 18-22 0.777778 63.500619 242.510146
6 BBBBB 23-27 2.000000 103.796290 313.685358
7 BBBBB 28-32 NaN NaN NaN
8 BBBBB 33-37 NaN NaN NaN
9 BBBBB 38-42 NaN NaN NaN
这篇关于按范围对数据进行分组时如何更改bin大小?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文