装箱,然后合并具有最少观察值的箱? [英] Binning and then combining bins with minimum number of observations?
本文介绍了装箱,然后合并具有最少观察值的箱?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
假设我先创建一些数据,然后创建不同大小的垃圾箱:
Let's say I create some data and then create bins of different sizes:
from __future__ import division
x = np.random.rand(1,20)
new, = np.digitize(x,np.arange(1,x.shape[1]+1)/100)
new_series = pd.Series(new)
print(new_series.value_counts())
显示:
20 17
16 1
4 1
2 1
dtype: int64
如果我将每个bin的最小阈值设置为至少2,则基本上我想转换基础数据,这样new_series.value_counts()
是这样的:
I basically want to transform the underlying data, if I set a minimum threshold of at least 2 per bin, so that new_series.value_counts()
is this:
20 17
16 3
dtype: int64
推荐答案
已
x = np.random.rand(1,100)
bins = np.arange(1,x.shape[1]+1)/100
new = np.digitize(x,bins)
n = new.copy()[0] # this will hold the the result
threshold = 2
for i in np.unique(n):
if sum(n == i) <= threshold:
n[n == i] += 1
n.clip(0, bins.size) # avoid adding beyond the last bin
n = n.reshape(1,-1)
这可以多次向上移动计数,直到一个垃圾箱被充满为止.
This can move counts up multiple times, until a bin is filled sufficiently.
代替使用np.digitize
可能更简单,因为使用np.histogram
可以直接为您提供计数,因此我们不需要自己sum
.
Instead of using np.digitize
, it might be simpler to use np.histogram
instead, because it will directly give you the counts, so that we don't need to sum
ourselves.
这篇关于装箱,然后合并具有最少观察值的箱?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文