在数据框中的其他列上使用Apply函数的新列 [英] New column using apply function on other columns in dataframe

查看:150
本文介绍了在数据框中的其他列上使用Apply函数的新列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,其中三列是数据坐标("H_x","H_y"和"H_z").我想计算数据的半径向量,并将其作为新列添加到我的数据框中.但是我对熊猫应用功能有些疑问. 我的代码是:

I have a dataframe where three of the columns are coordinates of data ('H_x', 'H_y' and 'H_z'). I want to calculate radius-vector of the data and add it as a new column in my dataframe. But I have some kind of problem with pandas apply function. My code is:

def radvec(x, y, z):
    rv=np.sqrt(x**2+y**2+z**2)
    return rv

halo_field['rh_field']=halo_field.apply(lambda row: radvec(row['H_x'], row['H_y'], row['H_z']), axis=1)

我得到的错误是:

group_sh.py:78: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas- 
docs/stable/indexing.html#indexing-view-versus-copy
halo_field['rh_field']=halo_field.apply(lambda row: radvec(row['H_x'], row['H_y'], row['H_z']), axis=1)

我得到了想要的列,但是我仍然对该错误消息感到困惑. 我知道这里也有类似的问题,但是我找不到解决问题的方法.我是python的新手.你能帮忙吗?

I get column that I want, but I'm still confused with this error message. I'm aware there are similar questions here, but I couldn't find how to solve my problem. I'm fairly new to python. Can you help?

halo_field是另一个数据框的一部分:

halo_field is a slice of another dataframe:

halo_field = halo_res[halo_res.N_subs==1] 

推荐答案

问题是您正在使用切片,该切片可能是不明确的:

The problem is you're working with a slice, which can be ambiguous:

halo_field = halo_res[halo_res.N_subs==1]

您有两个选择:

您可以显式复制数据框以避免警告,并确保原始数据框不受影响:

You can explicitly copy your dataframe to avoid the warning and ensure your original dataframe is unaffected:

halo_field = halo_res[halo_res.N_subs==1].copy()
halo_field['rh_field'] = halo_field.apply(...)

有条件地处理原始数据框

pd.DataFrame.loc 一起使用布尔掩码以更新原始数据框:

Work on the original dataframe conditionally

Use pd.DataFrame.loc with a Boolean mask to update your original dataframe:

mask = halo_res['N_subs'] == 1
halo_res.loc[mask, 'rh_field'] = halo_res.loc[mask, 'rh_field'].apply(...)

请勿使用apply

作为旁注,在 两种情况下,您都可以避免使用apply功能.例如:

Don't use apply

As a side note, in either scenario you can avoid apply for your function. For example:

halo_field['rh_field'] = (halo_field[['H_x', 'H_y', 'H_z']]**2).sum(1)**0.5

这篇关于在数据框中的其他列上使用Apply函数的新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆