Qcut Pandas:ValueError:Bin边缘必须唯一 [英] Qcut Pandas : ValueError: Bin edges must be unique

查看:236
本文介绍了Qcut Pandas:ValueError:Bin边缘必须唯一的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Pandas的Qcut以便将我的数据离散化到相等大小的存储桶中.我想有价格桶. 这是我的DataFrame:

I'm using Qcut from Pandas in order to discretize my Data into equal-sized buckets. I want to have price buckets. This is my DataFrame :

        productId   sell_prix   categ   popularity
11997   16758760.0  28.75        50      524137.0
11998   16758760.0  28.75        50      166795.0
13154   16782105.0  24.60        50      126890.5
13761   16790082.0  65.00        50      245437.0
13762   16790082.0  65.00        50      245242.0
15355   16792720.0  29.00        50      360219.0
15356   16792720.0  29.00        50      360100.0
15357   16792720.0  29.00        50      360027.0
15358   16792720.0  29.00        50      462850.0
15367   16792728.0  29.00        50      193030.5

这是我的代码:

df['PriceBucket'] = pd.qcut(df['sell_prix'], 3)

我收到此错误消息:

**ValueError: Bin edges must be unique: array([ 24.6,  29. ,  29. ,  65. ])**

实际上,我有一个7413行的DataFrame.因此,这只是真实DataFrame的示例.奇怪的是,当我对具有359824行的DataFrame使用相同的代码时,实际上具有相同的Data,它起作用了!是否与DataFrame的长度有关?

In reality, I have a DataFrame with 7413 rows. So this is just a sampling of the real DataFrame. The strange thing is that when I use the same code with a DataFrame with 359824 rows, with practically the same Data, it works ! Is there any dependence with the length of DataFrame ?

请帮助!非常感谢.

推荐答案

此处中讨论了各种解决方案,但简要介绍了:

Various solutions are discussed here, but briefly:

> pd.qcut(df['a'].rank(method='first'), 3)
0        [1, 2.333]
1        [1, 2.333]
2    (2.333, 3.667]
3        (3.667, 5]
4        (3.667, 5]

> pd.qcut(df['a'].rank(method='first'), 3, labels=False)
0    0
1    0
2    1
3    2
4    2

这篇关于Qcut Pandas:ValueError:Bin边缘必须唯一的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆