Pandas-基于与另一列的交叉引用计算新值 [英] Pandas - Calculate New Value Based on Cross Reference with Another Column
问题描述
我正在尝试在其值交叉引用到另一列的列中计算新值.
I'm trying to calculate new values in a column whose values are cross-referenced to another column.
>>> import pandas as pd
>>> df = pd.DataFrame( {"A":[0., 100., 80., 40., 0., 60.],
"B":[12, 12, 3, 19, 3, 19]} )
>>> df
A B
0 0.0 12
1 100.0 12
2 80.0 3
3 40.0 19
4 0.0 3
5 60.0 19
我想根据某个功能在A列中找到所有值为0,在B列中找到对应的值,然后更改具有相同B列值的所有A列值.例如,在上面的示例中,我想将列A的前两个值分别为0.和100更改为0.5和99.5,因为df.A[0]
为0. B列中的df.B[0] = 12
值与df.B[1] = 12
相同.
I want to find all values in column A that are 0, find out the corresponding value in column B, then change all column A values that have the same column B value, according to some function. For instance in the example above I would like to change the first two values of column A, df.A[0]
and df.A[1]
, respectively 0. and 100., into 0.5 and 99.5, because df.A[0]
is 0. and it has the same value df.B[0] = 12
in column B as df.B[1] = 12
.
df
A B
0 0.5 12
1 99.5 12
2 79.5 3
3 40.0 19
4 0.5 3
5 60.0 19
我尝试链接loc,aggregate,groupby和mask功能,但没有成功.是通过for循环的唯一方法吗?
I tried chaining loc, aggregate, groupby and mask functionalities, but I'm not succeeding. Is the only way through a for loop?
扩大示例以更好地说明意图.
Broadened example to better illustrate intent.
推荐答案
我找到了可行的解决方案,尽管可能不是最优的.我对分组依据进行链接,过滤和变换以获得所需的序列,然后将结果替换为原始数据帧.
I found a working solution, although probably sub-optimal. I chain groupby, filter and transform to obtain a desired series, and then replace the result in the original dataframe.
import pandas as pd
df = pd.DataFrame( {"A":[0., 100., 80., 40., 0., 60.],
"B":[12, 12, 3, 19, 3, 19]} )
u = ( df.groupby(by="B", sort=False)
.filter(lambda x: x.A.min() == 0, dropna=False)
.A.transform( lambda x: (x+0.5).where(x == 0, x - 0.5) )
)
df.loc[pd.notnull(u), "A"] = u
给出以下结果
print("\ninitial df\n",df,"\n\nintermediate series\n",u,"\n\nfinal result",df)
initial df
A B
0 0.0 12
1 100.0 12
2 80.0 3
3 40.0 19
4 0.0 3
5 60.0 19
intermediate series
0 0.5
1 99.5
2 79.5
3 NaN
4 0.5
5 NaN
Name: A, dtype: float64
final result A B
0 0.5 12
1 99.5 12
2 79.5 3
3 40.0 19
4 0.5 3
5 60.0 19
这篇关于Pandas-基于与另一列的交叉引用计算新值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!