pandas 数据框中的条件列算法 [英] Conditional column arithmetic in pandas dataframe

查看:52
本文介绍了 pandas 数据框中的条件列算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个具有以下结构的熊猫数据框:

I have a pandas dataframe with the following structure:

import numpy as np
import pandas as pd
myData = pd.DataFrame({'x': [1.2,2.4,5.3,2.3,4.1], 'y': [6.7,7.5,8.1,5.3,8.3], 'condition':[1,1,np.nan,np.nan,1],'calculation': [np.nan]*5})

print myData

   calculation  condition    x    y
0          NaN          1  1.2  6.7
1          NaN          1  2.4  7.5
2          NaN        NaN  5.3  8.1
3          NaN        NaN  2.3  5.3
4          NaN          1  4.1  8.3

我想基于"x"和"y"(例如x/y)中的值在计算"列中输入一个值,但仅在条件"列包含NaN的那些单元格中输入(np.isnan (myData ['condition']).最终的数据帧应如下所示:

I want to enter a value in the 'calculation' column based on the values in 'x' and 'y' (e.g. x/y) but only in those cells where the 'condition' column contains NaN (np.isnan(myData['condition']). The final dataframe should look like this:

   calculation  condition    x    y
0          NaN          1  1.2  6.7
1          NaN          1  2.4  7.5
2        0.654        NaN  5.3  8.1
3        0.434        NaN  2.3  5.3
4          NaN          1  4.1  8.3

我对使用"for"循环依次遍历每一行然后使用"if"语句进行计算的想法感到满意,但是我拥有的实际数据帧非常大,我想在基于数组的方式.这可能吗?我想我可以计算所有行的值,然后删除不需要的行,但这似乎浪费了很多精力(NaN在数据帧中非常少见),在某些情况下,条件"等于1 ,由于除以零而无法进行计算.

I'm happy with the idea of stepping through each row in turn using a 'for' loop and then using 'if' statements to make the calculations but the actual dataframe I have is very large and I wanted do the calculations in an array-based way. Is this possible? I guess I could calculate the value for all rows and then delete the ones I don't want but this seems like a lot of wasted effort (the NaNs are quite rare in the dataframe) and, in some cases where 'condition' equals 1, the calculation cannot be made due to division by zero.

谢谢.

推荐答案

使用where并将您的条件传递给它,这将仅在满足条件的行上执行您的计算:

Use where and pass your condition to it, this will then only perform your calculation where the rows meet the condition:

In [117]:

myData['calculation'] = (myData['x']/myData['y']).where(myData['condition'].isnull())
myData
Out[117]:
   calculation  condition    x    y
0          NaN          1  1.2  6.7
1          NaN          1  2.4  7.5
2     0.654321        NaN  5.3  8.1
3     0.433962        NaN  2.3  5.3
4          NaN          1  4.1  8.3

这篇关于 pandas 数据框中的条件列算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆