pandas python中的COUNTIF在具有多个条件的多个列上 [英] COUNTIF in pandas python over multiple columns with multiple conditions

查看:391
本文介绍了 pandas python中的COUNTIF在具有多个条件的多个列上的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据集,其中我试图确定每个人的危险因素数量.所以我有以下数据:

I have a dataset wherein I am trying to determine the number of risk factors per person. So I have the following data:

Person_ID  Age  Smoker  Diabetes
      001   30       Y         N
      002   45       N         N
      003   27       N         Y
      004   18       Y         Y
      005   55       Y         Y

每个属性(年龄,吸烟者,糖尿病)都有自己的条件来确定是否是危险因素.因此,如果年龄> = 45,则是一个危险因素.吸烟者和糖尿病为"Y"是危险因素.我想要添加一列,以根据这些条件将每个人的风险因素总数加起来.因此数据看起来像这样:

Each attribute (Age, Smoker, Diabetes) has its own condition to determine whether it is a risk factor. So if Age >= 45, it's a risk factor. Smoker and Diabetes are risk factors if they are "Y". What I would like is to add a column that adds up the number of risk factors for each person based on those conditions. So the data would look like this:

Person_ID  Age  Smoker  Diabetes  Risk_Factors
      001   30       Y         N             1
      002   25       N         N             0
      003   27       N         Y             1
      004   18       Y         Y             2
      005   55       Y         Y             3

我有一个样本数据集,我在Excel中鬼混,而我这样做的方式是使用COUNTIF公式,如下所示:

I have a sample dataset that I was fooling around with in Excel, and the way I did it there was to use the COUNTIF formula like so:

=COUNTIF(B2,">45") + COUNTIF(C2,"=Y") + COUNTIF(D2,"=Y")

但是,我将使用的实际数据集对于Excel来说太大了,因此我正在学习python的pandas.我希望我可以提供一些示例,说明我已经尝试过的内容,但是坦率地说,我什至不知道从哪里开始.我看着这个问题,但它并没有真正解决什么问题可以使用来自多个列的不同条件将其应用于整个新列.有什么建议吗?

However, the actual dataset that I will be using is way too large for Excel, so I'm learning pandas for python. I wish I could provide examples of what I've already tried, but frankly I don't even know where to start. I looked at this question, but it doesn't really address what to do about applying it to an entire new column using different conditions from multiple columns. Any suggestions?

推荐答案

如果您要坚持使用熊猫.您可以使用以下...

If you want to stick with pandas. You can use the following...

isY = lambda x:int(x=='Y')
countRiskFactors = lambda row: isY(row['Smoker']) + isY(row['Diabetes']) + int(row["Age"]>45)

df['Risk_Factors'] = df.apply(countRiskFactors,axis=1)

工作方式

isY-是一个存储的lambda函数,它检查单元格的值是否为Y,否则返回1. countRiskFactors-汇总风险因素

isY - is a stored lambda function that checks if the value of a cell is Y returns 1 if it is otherwise 0 countRiskFactors - adds up the risk factors

最后一行使用apply方法,将参数键设置为1,该方法沿DataFrame逐行应用方法-first参数-并返回附加到DataFrame的Series.

the final line uses the apply method, with the paramater key set to 1, which applies the method -first parameter - row wise along the DataFrame and Returns a Series which is appended to the DataFrame.

打印df的输出

   Person_ID  Age Smoker Diabetes  Risk_Factors
0          1   30      Y        N             1
1          2   45      N        N             0
2          3   27      N        Y             1
3          4   18      Y        Y             2
4          5   55      Y        Y             3

这篇关于 pandas python中的COUNTIF在具有多个条件的多个列上的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆