如何使用列 A 和 B 使用来自 DF2 的值在 DF1 上创建列 C [英] How to create Column C on DF1 using values from DF2 using Column A and B

查看:49
本文介绍了如何使用列 A 和 B 使用来自 DF2 的值在 DF1 上创建列 C的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,其中包含姓名、performance_factor_1 和 performance_factor_2 等员工信息.

I have a dataframe that contains employee information such as Name, performance_factor_1 and performance_factor_2 .

我有另一个数据框,我根据 performance_factor_1 和 performance_actor_2 获得报酬.

I have another dataframe where I get pay based on performance_factor_1 and performance_actor_2.

df1(抱歉格式化不知道如何解决)

df1 (Sorry for formatting not sure how to fix it)

Name                  pf1       pf2     pf3
Adam                  14.6      8.9     59 
Bob                   13.2      9       75
Charlie               11.1      9.1     89
Dylan                 14.6      9       97
Eric                  11.1      8.8     105
Fedderick             12.5      9.2     69

df2数据框 2 的行是 performance_factor_1,列是 performance_factor_2.

df2 The rows of dataframe 2 are performance_factor_1 and columns are performance_factor_2.

pf1     8.8 8.9 9   9.1 9.2
14.6    100 200 300 400 500
13.2    200 300 400 500 600
12.5    300 400 500 600 700
11.1    400 500 600 700 800

对于 df2['pf1'],它从 1 扩展到 14,保留 1 位小数.对于列,它从 8.8 到 10,带一个小数点.如果我能够使用诸如 8.8 -9.2 之类的排序范围来获得这些值会更好.但是,目前我只是在寻找基于确切值的薪酬.

For df2['pf1'] it extends from 1 to 14 with 1 decimal place. for the columns it goes from 8.8 to 10 with one decimal point. It would be better if I was able to attain the values using a range of sort such as 8.8 -9.2 . However, for now I am only currently looking for the pay based on exact values.

如果p3高于70,我想要做的是向df1添加第三列pay,如下所示:df1

What I want to do is add a third column pay to df1 such as below if p3 is above 70: df1

Name                  pf1       pf2      pay
Adam                  14.6      8.9      200
Bob                   13.2      9        400
Charlie               11.1      9.1      700
Dylan                 14.6      9        300
Eric                  11.1      8.8      400
Fedderick             12.5      9.2      700

我在编码方面的尝试是:1) 使用一个函数,然后在下面的 loc 函数中调用它,但它一直抛出'Series' 对象是可变的,因此不可散列"错误

What I have tried in terms of coding is: 1) Using a function and then calling it during the loc function below but it kept throwing a "'Series' objects are mutable, thus unhashable" error

def indivpay(ttr, csat):
    dude = (indiv.at[ttr, csat])
    return dude
df1.loc[df1['pf3']>=70, 'pay'] =  indivpay(df_outer['pf1'], df_outer['pf2'])

2) 在 loc 函数本身中获取了pay 值,但它不断抛出'Series' 对象是可变的,因此不可散列"错误

2) Getting the pay value in the loc function itself but it kept throwing a "'Series' objects are mutable, thus unhashable" error

df_outer.loc[df_outer['# of Closed SRs']>=70, 'Individual Bonus'] =  indiv.at[df_outer['Time to Resolve'], df_outer['CSAT (NSE)'].astype(str)]

<小时>

在使用 loc 函数之前,我已经解决了一个类似的问题.但是,为此我在同一数据框中基于 A 和 B 创建了列 C.我为此使用了以下代码:


I've fixed a similar problem before using the loc function.However, for that I created column C based on A and B within the same dataframe. I used the below code for that:

df.loc[df['Last Resolved Date'].notnull(), 'Duration'] =  (df['Closed Date'] - df['Date Opened'])

它能够用天数填写 Duration 列.但是,这种方法似乎对上述问题不起作用.

It was able to fill out the Duration column with the number of days. However, this method does not seem to work for the above mentioned problem.

最后,我想要的是仅在 p3 高于 70 时才根据 pf1 和 pf2 将报酬添加到 df1.

In the end what I want is for pay to get added to df1 based on pf1 and pf2 only if p3 is above 70.

现在是否可以使用 pf1 和 pf2 的范围来获得报酬

Now is it possible to get the pay using a range of pf1 and pf2

我创建了 使用 df1 中的值从 df2 中检索值,其中 df2 列和索引包含一系列值 用于第二个问题.

I created Using values from df1 to retrieve values from df2 where df2 columns and index contain a range of values for this second question.

推荐答案

首先你可以用 DataFrame.lookup:

First you can create new column with DataFrame.lookup:

#if pf1 is first column, not index
#df2 = df2.set_index('pf1')
df2 = df2.rename(columns=float)

df1['Pay'] = df2.lookup(df1['pf1'], df1['pf2'])
print (df1)
        Name   pf1  pf2  pf3  Pay
0       Adam  14.6  8.9   59  200
1        Bob  13.2  9.0   75  400
2    Charlie  11.1  9.1   89  700
3      Dylan  14.6  9.0   97  300
4       Eric  11.1  8.8  105  400
5  Fedderick  12.5  9.2   69  700

因为使用浮点数,可能有些值不匹配,因为准确性,所以可能的解决方案是通过 10 多个值并转换为整数:

Because working with floats, is possible some values not matched, because accuracy, so possible solution is multiple values by 10 and cast to integers:

df3 = df2.rename(index= lambda x: int(x * 10),
                 columns= lambda x: int(float(x) * 10))

df1['Pay'] = df3.lookup(df1['pf1'].mul(10).astype(int), df1['pf2'].mul(10).astype(int))
print (df1)
        Name   pf1  pf2  pf3  Pay
0       Adam  14.6  8.9   59  200
1        Bob  13.2  9.0   75  400
2    Charlie  11.1  9.1   89  700
3      Dylan  14.6  9.0   97  300
4       Eric  11.1  8.8  105  400
5  Fedderick  12.5  9.2   69  700

如果可能,有些值不匹配:

If possible some values not matched:

df3 = df2.rename(index= lambda x: int(x * 10),
                 columns= lambda x: int(float(x) * 10))

out= []
for row, col in zip(df1['pf1'].mul(10).astype(int), df1['pf2'].mul(10).astype(int)):
    try:
        out.append(df3.at[row, col] )
    except KeyError:
        out.append(np.nan)

df1['Pay'] = out
print (df1)
        Name   pf1  pf2  pf3  Pay
0       Adam  14.6  8.9   59  200
1        Bob  13.2  9.0   75  400
2    Charlie  11.1  9.1   89  700
3      Dylan  14.6  9.0   97  300
4       Eric  11.1  8.8  105  400
5  Fedderick  12.5  9.2   69  700

最后您可以按条件分配/创建新列:

Last you can assign/create new columns by conditions:

df1.loc[df1['pf3']>=70, 'Pay_new'] = df1['Pay']
print (df1)
        Name   pf1  pf2  pf3  Pay  Pay_new
0       Adam  14.6  8.9   59  200      NaN
1        Bob  13.2  9.0   75  400    400.0
2    Charlie  11.1  9.1   89  700    700.0
3      Dylan  14.6  9.0   97  300    300.0
4       Eric  11.1  8.8  105  400    400.0
5  Fedderick  12.5  9.2   69  700      NaN

这篇关于如何使用列 A 和 B 使用来自 DF2 的值在 DF1 上创建列 C的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆