如何使用列 A 和 B 使用来自 DF2 的值在 DF1 上创建列 C [英] How to create Column C on DF1 using values from DF2 using Column A and B
问题描述
我有一个数据框,其中包含姓名、performance_factor_1 和 performance_factor_2 等员工信息.
I have a dataframe that contains employee information such as Name, performance_factor_1 and performance_factor_2 .
我有另一个数据框,我根据 performance_factor_1 和 performance_actor_2 获得报酬.
I have another dataframe where I get pay based on performance_factor_1 and performance_actor_2.
df1(抱歉格式化不知道如何解决)
df1 (Sorry for formatting not sure how to fix it)
Name pf1 pf2 pf3
Adam 14.6 8.9 59
Bob 13.2 9 75
Charlie 11.1 9.1 89
Dylan 14.6 9 97
Eric 11.1 8.8 105
Fedderick 12.5 9.2 69
df2数据框 2 的行是 performance_factor_1,列是 performance_factor_2.
df2 The rows of dataframe 2 are performance_factor_1 and columns are performance_factor_2.
pf1 8.8 8.9 9 9.1 9.2
14.6 100 200 300 400 500
13.2 200 300 400 500 600
12.5 300 400 500 600 700
11.1 400 500 600 700 800
对于 df2['pf1'],它从 1 扩展到 14,保留 1 位小数.对于列,它从 8.8 到 10,带一个小数点.如果我能够使用诸如 8.8 -9.2 之类的排序范围来获得这些值会更好.但是,目前我只是在寻找基于确切值的薪酬.
For df2['pf1'] it extends from 1 to 14 with 1 decimal place. for the columns it goes from 8.8 to 10 with one decimal point. It would be better if I was able to attain the values using a range of sort such as 8.8 -9.2 . However, for now I am only currently looking for the pay based on exact values.
如果p3高于70,我想要做的是向df1添加第三列pay,如下所示:df1
What I want to do is add a third column pay to df1 such as below if p3 is above 70: df1
Name pf1 pf2 pay
Adam 14.6 8.9 200
Bob 13.2 9 400
Charlie 11.1 9.1 700
Dylan 14.6 9 300
Eric 11.1 8.8 400
Fedderick 12.5 9.2 700
我在编码方面的尝试是:1) 使用一个函数,然后在下面的 loc 函数中调用它,但它一直抛出'Series' 对象是可变的,因此不可散列"错误
What I have tried in terms of coding is: 1) Using a function and then calling it during the loc function below but it kept throwing a "'Series' objects are mutable, thus unhashable" error
def indivpay(ttr, csat):
dude = (indiv.at[ttr, csat])
return dude
df1.loc[df1['pf3']>=70, 'pay'] = indivpay(df_outer['pf1'], df_outer['pf2'])
2) 在 loc 函数本身中获取了pay 值,但它不断抛出'Series' 对象是可变的,因此不可散列"错误
2) Getting the pay value in the loc function itself but it kept throwing a "'Series' objects are mutable, thus unhashable" error
df_outer.loc[df_outer['# of Closed SRs']>=70, 'Individual Bonus'] = indiv.at[df_outer['Time to Resolve'], df_outer['CSAT (NSE)'].astype(str)]
<小时>
在使用 loc 函数之前,我已经解决了一个类似的问题.但是,为此我在同一数据框中基于 A 和 B 创建了列 C.我为此使用了以下代码:
I've fixed a similar problem before using the loc function.However, for that I created column C based on A and B within the same dataframe. I used the below code for that:
df.loc[df['Last Resolved Date'].notnull(), 'Duration'] = (df['Closed Date'] - df['Date Opened'])
它能够用天数填写 Duration 列.但是,这种方法似乎对上述问题不起作用.
It was able to fill out the Duration column with the number of days. However, this method does not seem to work for the above mentioned problem.
最后,我想要的是仅在 p3 高于 70 时才根据 pf1 和 pf2 将报酬添加到 df1.
In the end what I want is for pay to get added to df1 based on pf1 and pf2 only if p3 is above 70.
现在是否可以使用 pf1 和 pf2 的范围来获得报酬
Now is it possible to get the pay using a range of pf1 and pf2
我创建了 使用 df1 中的值从 df2 中检索值,其中 df2 列和索引包含一系列值 用于第二个问题.
I created Using values from df1 to retrieve values from df2 where df2 columns and index contain a range of values for this second question.
推荐答案
首先你可以用 DataFrame.lookup
:
First you can create new column with DataFrame.lookup
:
#if pf1 is first column, not index
#df2 = df2.set_index('pf1')
df2 = df2.rename(columns=float)
df1['Pay'] = df2.lookup(df1['pf1'], df1['pf2'])
print (df1)
Name pf1 pf2 pf3 Pay
0 Adam 14.6 8.9 59 200
1 Bob 13.2 9.0 75 400
2 Charlie 11.1 9.1 89 700
3 Dylan 14.6 9.0 97 300
4 Eric 11.1 8.8 105 400
5 Fedderick 12.5 9.2 69 700
因为使用浮点数,可能有些值不匹配,因为准确性,所以可能的解决方案是通过 10
多个值并转换为整数:
Because working with floats, is possible some values not matched, because accuracy, so possible solution is multiple values by 10
and cast to integers:
df3 = df2.rename(index= lambda x: int(x * 10),
columns= lambda x: int(float(x) * 10))
df1['Pay'] = df3.lookup(df1['pf1'].mul(10).astype(int), df1['pf2'].mul(10).astype(int))
print (df1)
Name pf1 pf2 pf3 Pay
0 Adam 14.6 8.9 59 200
1 Bob 13.2 9.0 75 400
2 Charlie 11.1 9.1 89 700
3 Dylan 14.6 9.0 97 300
4 Eric 11.1 8.8 105 400
5 Fedderick 12.5 9.2 69 700
如果可能,有些值不匹配:
If possible some values not matched:
df3 = df2.rename(index= lambda x: int(x * 10),
columns= lambda x: int(float(x) * 10))
out= []
for row, col in zip(df1['pf1'].mul(10).astype(int), df1['pf2'].mul(10).astype(int)):
try:
out.append(df3.at[row, col] )
except KeyError:
out.append(np.nan)
df1['Pay'] = out
print (df1)
Name pf1 pf2 pf3 Pay
0 Adam 14.6 8.9 59 200
1 Bob 13.2 9.0 75 400
2 Charlie 11.1 9.1 89 700
3 Dylan 14.6 9.0 97 300
4 Eric 11.1 8.8 105 400
5 Fedderick 12.5 9.2 69 700
最后您可以按条件分配/创建新列:
Last you can assign/create new columns by conditions:
df1.loc[df1['pf3']>=70, 'Pay_new'] = df1['Pay']
print (df1)
Name pf1 pf2 pf3 Pay Pay_new
0 Adam 14.6 8.9 59 200 NaN
1 Bob 13.2 9.0 75 400 400.0
2 Charlie 11.1 9.1 89 700 700.0
3 Dylan 14.6 9.0 97 300 300.0
4 Eric 11.1 8.8 105 400 400.0
5 Fedderick 12.5 9.2 69 700 NaN
这篇关于如何使用列 A 和 B 使用来自 DF2 的值在 DF1 上创建列 C的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!