Pandas:如何使用大于和小于分位数分配新的 DF 值? [英] Pandas: How assign to a new DF values in quantiles, using greater than and smaller than?

查看:25
本文介绍了Pandas:如何使用大于和小于分位数分配新的 DF 值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是编程新手,我的英语不是很好,所以请耐心等待=D

I am new to coding, and my English isn't that good so please be patient with me =D

这是主 DF (df_mcred_pf).我在下面完整发布了所有数据和代码.

This is the main DF (df_mcred_pf). I posted all data and code in full below.

从主 DF,我创建了一个 DF,其中包含来自第一个分位数的所有值,并且它起作用了:

From the main DF, I created a DF with all values from the 1st quantile and it worked:

df_mcred_pf_Q1 = df_mcred_pf[df_mcred_pf['vr_tx_jrs']<=np.quantile(df_mcred_pf['vr_tx_jrs'], vQ1_mcred_pf/100)]
df_mcred_pf_Q1.head(30)

现在我需要使用第二个分位数的值创建一个新的 DF:所有大于 1sq 分位数 (vQ1_mcred_pf) 的值且小于第二个分位数 (<代码>vQ2_mcred_pf).我试过了,但没有用:

Now I need to create a new DF with the values of the 2nd quantile: all values greater than the values of the 1sq quantile (vQ1_mcred_pf) and smaller than the values of the 2nd quantile (vQ2_mcred_pf). I tried this but it didn't work:

df_mcred_pf_Q2 = df_mcred_pf[df_mcred_pf['vr_tx_jrs']>np.quantile(df_mcred_pf['vr_tx_jrs'], vQ1_mcred_pf/100) & df_mcred_pf['vr_tx_jrs']<=np.quantile(df_mcred_pf['vr_tx_jrs'], vQ2_mcred_pf/100)]

我收到此错误:TypeError:无法使用 dtyped [float64] 数组和 [bool] 类型的标量执行 'rand_'

我被困在这里了.请帮帮我好吗?

And I'm stuck here. Could you help me, please?

完整代码在这里:

import pandas as pd
import numpy as np
    
df_mcred_pf = pd.DataFrame([[2, 12, "F", 1, 1, 12.55, 437],
[2, 12, "F", 1, 1, 17.81, 437],
[2, 12, "F", 1, 1, 18.14, 437],
[2, 12, "F", 1, 1, 20.43, 437],
[2, 12, "F", 1, 1, 21.19, 437],
[2, 12, "F", 1, 1, 22.73, 437],
[2, 12, "F", 1, 1, 23.73, 437],
[2, 12, "F", 1, 1, 25.26, 437],
[2, 12, "F", 1, 1, 25.34, 437],
[2, 12, "F", 1, 1, 26.02, 437],
[2, 12, "F", 1, 1, 26.78, 437],
[2, 12, "F", 1, 1, 26.79, 437],
[2, 12, "F", 1, 1, 26.83, 437],
[2, 12, "F", 1, 1, 27.59, 437],
[2, 12, "F", 1, 1, 27.83, 437],
[2, 12, "F", 1, 1, 28.32, 437],
[2, 12, "F", 1, 1, 28.32, 437],
[2, 12, "F", 1, 1, 28.83, 437],
[2, 12, "F", 1, 1, 29.08, 437],
[2, 12, "F", 1, 1, 29.13, 437],
[2, 12, "F", 1, 1, 29.33, 437],
[2, 12, "F", 1, 1, 29.84, 437],
[2, 12, "F", 1, 1, 29.85, 437],
[2, 12, "F", 1, 1, 30.36, 437],
[2, 12, "F", 1, 1, 30.62, 437],
[2, 12, "F", 1, 1, 30.87, 437],
[2, 12, "F", 1, 1, 31.38, 437],
[2, 12, "F", 1, 1, 31.39, 437],
[2, 12, "F", 1, 1, 31.89, 437],
[2, 12, "F", 1, 1, 32.92, 437]], columns=['cd_mod_pri', 'cd_mod_sec', 'id_tp_pes', 'cd_idx_pri', 'cd_idx_sec', 'vr_tx_jrs', 'quantidade'])
    


MAX_mcred = df_mcred_pf['vr_tx_jrs'].max()    

MIN_mcred = df_mcred_pf['vr_tx_jrs'].min()
    
vQ1_mcred_pf = df_mcred_pf['vr_tx_jrs'].quantile(0.25)
vQ2_mcred_pf = df_mcred_pf['vr_tx_jrs'].quantile(0.50)
vQ3_mcred_pf = df_mcred_pf['vr_tx_jrs'].quantile(0.75)
vQ4_mcred_pf = df_mcred_pf['vr_tx_jrs'].quantile(1.00)

df_mcred_pf_Q1 = df_mcred_pf[df_mcred_pf['vr_tx_jrs']<=np.quantile(df_mcred_pf['vr_tx_jrs'], vQ1_mcred_pf/100)]
df_mcred_pf_Q1.head(30)

MEDIAN_mcred = df_mcred_pf_Q1["vr_tx_jrs"].median()

df_mcred_pf_Q2 = df_mcred_pf[df_mcred_pf['vr_tx_jrs']>np.quantile(df_mcred_pf['vr_tx_jrs'], vQ1_mcred_pf/100) & df_mcred_pf['vr_tx_jrs']<=np.quantile(df_mcred_pf['vr_tx_jrs'], vQ2_mcred_pf/100)]

推荐答案

我会以不同的方式解决这个问题,并创建一个带有分位数描述符的列:

I would address this problem differently and create a column with a quantile descriptor:

import pandas as pd
import numpy as np
    
#your dataframe here
    
quant = [0, .25, .5, .75, 1]
s = df_mcred_pf["vr_tx_jrs"].quantile(quant)

df_mcred_pf["Quartil"] = pd.cut(df_mcred_pf["vr_tx_jrs"], s, include_lowest=True, labels=["Q1", "Q2", "Q3", "Q4"])

这将返回以下输出:

    cd_mod_pri  cd_mod_sec id_tp_pes  ...  vr_tx_jrs  quantidade  Quartil
0            2          12         F  ...      12.55         437     Q1
1            2          12         F  ...      17.81         437     Q1
2            2          12         F  ...      18.14         437     Q1
3            2          12         F  ...      20.43         437     Q1
4            2          12         F  ...      21.19         437     Q1
5            2          12         F  ...      22.73         437     Q1
6            2          12         F  ...      23.73         437     Q1
7            2          12         F  ...      25.26         437     Q1
8            2          12         F  ...      25.34         437     Q2
9            2          12         F  ...      26.02         437     Q2
10           2          12         F  ...      26.78         437     Q2
...
28           2          12         F  ...      31.89         437     Q4
29           2          12         F  ...      32.92         437     Q4

[30 rows x 8 columns]

现在,您可以按四分位数过滤数据框:

Now, you can filter the dataframe by quartile:

print(df_mcred_pf[df_mcred_pf["Quartil"]=="Q2"])

您也可以选择将四分位数编码为数字,例如

You can also choose to code the quartile as a number, e.g.,

labels=range(len(quant)-1)

然后,你可以得到高达 0.75 的四分位数

Then, you could get quartiles up to 0.75 with

print(df_mcred_pf[df_mcred_pf["Quartil"]<3])

也许有更简单的方法来实现这一点,让我们看看其他人会想出什么.

Maybe there are easier ways to achieve this, let's see what other people will come up with.

这篇关于Pandas:如何使用大于和小于分位数分配新的 DF 值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆