Qcut Pandas:ValueError:Bin边缘必须唯一 [英] Qcut Pandas : ValueError: Bin edges must be unique
问题描述
我正在使用Pandas的Qcut以便将我的数据离散化到相等大小的存储桶中.我想有价格桶. 这是我的DataFrame:
I'm using Qcut from Pandas in order to discretize my Data into equal-sized buckets. I want to have price buckets. This is my DataFrame :
productId sell_prix categ popularity
11997 16758760.0 28.75 50 524137.0
11998 16758760.0 28.75 50 166795.0
13154 16782105.0 24.60 50 126890.5
13761 16790082.0 65.00 50 245437.0
13762 16790082.0 65.00 50 245242.0
15355 16792720.0 29.00 50 360219.0
15356 16792720.0 29.00 50 360100.0
15357 16792720.0 29.00 50 360027.0
15358 16792720.0 29.00 50 462850.0
15367 16792728.0 29.00 50 193030.5
这是我的代码:
df['PriceBucket'] = pd.qcut(df['sell_prix'], 3)
我收到此错误消息:
**ValueError: Bin edges must be unique: array([ 24.6, 29. , 29. , 65. ])**
实际上,我有一个7413行的DataFrame.因此,这只是真实DataFrame的示例.奇怪的是,当我对具有359824行的DataFrame使用相同的代码时,实际上具有相同的Data,它起作用了!是否与DataFrame的长度有关?
In reality, I have a DataFrame with 7413 rows. So this is just a sampling of the real DataFrame. The strange thing is that when I use the same code with a DataFrame with 359824 rows, with practically the same Data, it works ! Is there any dependence with the length of DataFrame ?
请帮助!非常感谢.
推荐答案
在此处中讨论了各种解决方案,但简要介绍了:
Various solutions are discussed here, but briefly:
> pd.qcut(df['a'].rank(method='first'), 3)
0 [1, 2.333]
1 [1, 2.333]
2 (2.333, 3.667]
3 (3.667, 5]
4 (3.667, 5]
或
> pd.qcut(df['a'].rank(method='first'), 3, labels=False)
0 0
1 0
2 1
3 2
4 2
这篇关于Qcut Pandas:ValueError:Bin边缘必须唯一的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!