在数据框中的其他列上使用应用函数的新列 [英] New column using apply function on other columns in dataframe
问题描述
我有一个数据框,其中三列是数据的坐标('H_x'、'H_y' 和 'H_z').我想计算数据的半径向量并将其添加为数据框中的新列.但是我对熊猫应用功能有一些问题.我的代码是:
I have a dataframe where three of the columns are coordinates of data ('H_x', 'H_y' and 'H_z'). I want to calculate radius-vector of the data and add it as a new column in my dataframe. But I have some kind of problem with pandas apply function. My code is:
def radvec(x, y, z):
rv=np.sqrt(x**2+y**2+z**2)
return rv
halo_field['rh_field']=halo_field.apply(lambda row: radvec(row['H_x'], row['H_y'], row['H_z']), axis=1)
我得到的错误是:
group_sh.py:78: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-
docs/stable/indexing.html#indexing-view-versus-copy
halo_field['rh_field']=halo_field.apply(lambda row: radvec(row['H_x'], row['H_y'], row['H_z']), axis=1)
我得到了我想要的列,但我仍然对这个错误消息感到困惑.我知道这里有类似的问题,但我找不到如何解决我的问题.我对python相当陌生.你能帮忙吗?
I get column that I want, but I'm still confused with this error message. I'm aware there are similar questions here, but I couldn't find how to solve my problem. I'm fairly new to python. Can you help?
halo_field
是另一个数据帧的一部分:
halo_field
is a slice of another dataframe:
halo_field = halo_res[halo_res.N_subs==1]
推荐答案
问题是你正在处理一个切片,它可能不明确:
The problem is you're working with a slice, which can be ambiguous:
halo_field = halo_res[halo_res.N_subs==1]
您有两个选择:
您可以显式复制您的数据框以避免警告并确保您的原始数据框不受影响:
You can explicitly copy your dataframe to avoid the warning and ensure your original dataframe is unaffected:
halo_field = halo_res[halo_res.N_subs==1].copy()
halo_field['rh_field'] = halo_field.apply(...)
有条件地处理原始数据帧
使用 pd.DataFrame.loc
使用布尔掩码更新原始数据框:
Work on the original dataframe conditionally
Use pd.DataFrame.loc
with a Boolean mask to update your original dataframe:
mask = halo_res['N_subs'] == 1
halo_res.loc[mask, 'rh_field'] = halo_res.loc[mask, 'rh_field'].apply(...)
不要使用apply
附带说明,在任一场景中,您都可以避免 apply
用于您的函数.例如:
Don't use apply
As a side note, in either scenario you can avoid apply
for your function. For example:
halo_field['rh_field'] = (halo_field[['H_x', 'H_y', 'H_z']]**2).sum(1)**0.5
这篇关于在数据框中的其他列上使用应用函数的新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!