Python Pandas使用pd.qcut创建新的Bin/Bucket变量 [英] Python Pandas Create New Bin/Bucket Variable with pd.qcut

查看:155
本文介绍了Python Pandas使用pd.qcut创建新的Bin/Bucket变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在python中使用pd.qut创建新的Bin/Bucket变量?

How do you create a new Bin/Bucket Variable using pd.qut in python?

这对于有经验的用户来说似乎很基本,但是我对此并不十分清楚,并且在堆栈溢出/google上进行搜索令人惊讶地不直观.进行了彻底的搜索(将qcut分配为新列),但它没有不能完全回答我的问题,因为它没有采取最后一步并将所有内容放入垃圾箱(即1,2,...).

This might seem elementary to experienced users but I was not super clear on this and it was surprisingly unintuitive to search for on stack overflow/google. Some thorough searching yielded this (Assignment of qcut as new column) but it didn't quite answer my question because it didn't take the last step and put everything into bins (i.e. 1,2,...).

推荐答案

以下答案仅对小于0.15.0的Pandas版本有效.如果您运行的是Pandas 15或更高版本,请参阅:

The below answer is only valid for versions of Pandas less than 0.15.0. If you are running Pandas 15 or higher, see:

data3['bins_spd'] = pd.qcut(data3['spd_pct'], 5, labels=False)

感谢@unutbu指出来. :)

Thanks to @unutbu for pointing it out. :)

说,您有一些要合并的数据,在我的情况下,这些选项很广,并且您想使用与每个观察值相对应的存储桶创建一个新变量.上面提到的链接可以通过以下方式实现:

Say you have some data that you want to bin, in my case options spreads, and you want to make a new variable with the buckets corresponding to each observation. The link mentioned above that you can do this by:

print pd.qcut(data3['spd_pct'], 40)

(0.087, 0.146]
(0.0548, 0.087]
(0.146, 0.5]
(0.146, 0.5]
(0.087, 0.146]
(0.0548, 0.087]
(0.5, 2]

可为您提供与每个观察值相对应的bin端点.但是,如果您希望为每个观测值使用相应的箱号,则可以执行以下操作:

which gives you what the bin endpoints are that correspond to each observation. However, if you would like the corresponding bin numbers for each observation then you can do this:

print pd.qcut(data3['spd_pct'],5).labels

[2 1 3 ..., 0 1 4] 

如果只想使用bin编号创建一个新变量,则将它们放在一起,就足够了:

Putting it all together if you would like to create a new variable with just the bin numbers, this should suffice:

data3['bins_spd']=pd.qcut(data3['spd_pct'],5).labels

print data3.head()

   secid      date    symbol  symbol_flag     exdate   last_date cp_flag  0   5005  1/2/1997  099F2.37            0  1/18/1997         NaN       P   
1   5005  1/2/1997  09B0B.1B            0  2/22/1997   12/3/1996       P   
2   5005  1/2/1997  09B7C.2F            0  2/22/1997  12/11/1996       P   
3   5005  1/2/1997  09EE6.6E            0  1/18/1997  12/27/1996       C   
4   5005  1/2/1997  09F2F.CE            0  8/16/1997         NaN       P   

   strike_price  best_bid  best_offer     ...      close  volume_y    return  0          7500     2.875      3.2500     ...        4.5     99200  0.074627   
1         10000     5.375      5.7500     ...        4.5     99200  0.074627   
2          5000     0.625      0.8750     ...        4.5     99200  0.074627   
3          5000     0.125      0.1875     ...        4.5     99200  0.074627   
4          7500     3.000      3.3750     ...        4.5     99200  0.074627   

   cfadj_y  open  cfret  shrout      mid   spd_pct  bins_spd  
0        1   4.5      1   57735  3.06250  0.122449         2  
1        1   4.5      1   57735  5.56250  0.067416         1  
2        1   4.5      1   57735  0.75000  0.333333         3  
3        1   4.5      1   57735  0.15625  0.400000         3  
4        1   4.5      1   57735  3.18750  0.117647         2  

[5 rows x 35 columns]

希望这对其他人有帮助.至少现在应该更容易搜索. :)

Hope this helps somebody else. At the very least it should be easier to search for now. :)

这篇关于Python Pandas使用pd.qcut创建新的Bin/Bucket变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆