使用多个If-else的Pandas变量创建 [英] Pandas variable creation using multiple If-else

查看:1637
本文介绍了使用多个If-else的Pandas变量创建的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

需要Pandas多个IF-ELSE语句的帮助。我有一个测试数据集(泰坦尼克号)如下:

Need help with Pandas multiple IF-ELSE statements. I have a test dataset (titanic) as follows:

ID  Survived    Pclass  Name    Sex Age
1   0   3   Braund  male    22
2   1   1   Cumings, Mrs.   female  38
3   1   3   Heikkinen, Miss. Laina  female  26
4   1   1   Futrelle, Mrs.  female  35
5   0   3   Allen, Mr.  male    35
6   0   3   Moran, Mr.  male    
7   0   1   McCarthy, Mr.   male    54
8   0   3   Palsson, Master male    2

其中Id是乘客ID。我想在此数据框中创建一个新的标志变量,该变量具有以下规则:

where Id is the passenger id. I want to create a new flag variable in this data frame which has the following rule:

if Sex=="female" or (Pclass==1 and Age <18) then 1 else 0. 

现在这样做我尝试了一些方法。这就是我首先接近的方式:

Now to do this I tried a few approaches. This is how I approached first:

df=pd.read_csv(data.csv)
for passenger_index,passenger in df.iterrows():
    if passenger['Sex']=="female" or (passenger['Pclass']==1 and passenger['Age']<18):
       df['Prediction']=1
    else:
       df['Prediction']=0

上面代码的问题是它在df中创建一个Prediction变量,但是所有值都为0.

The problem with above code is that it creates a Prediction variable in df but with all values as 0.

但是如果我使用相同的代码而是输出它给了一个字典,它给出了正确答案,如下所示:

However if I use the same code but instead output it to a dictionary it gives the right answer as shown below:

prediction={}
df=pd.read_csv(data.csv)
for passenger_index,passenger in df.iterrows():
    if passenger['Sex']=="female" or (passenger['Pclass']==1 and passenger['Age']<18):
       prediction[passenger['ID']=1
    else:
       prediction[passenger['ID']=0

这给出了一个dict预测根据上述逻辑,键为ID,值为1或0。

This gives a dict prediction with keys as ID and values as 1 or 0 based on the above logic.

那么为什么df变量工作错误?我甚至尝试先定义一个函数然后调用它。和第一个一样。

So why the df variable works wrongly?. I even tried by first defining a function and then calling it. Gave the same ans as first.

那么,我们怎么能在熊猫中做到这一点?

So, how can we do this in pandas?.

其次,如果我们可以使用多个if-else语句,我想也可以这样做。我知道np.where但它不允许添加'和'条件。所以这就是我的尝试:

Secondly, I guess the same can be done if we can just use some multiple if-else statements. I know np.where but it is not allowing to add 'and' condition. So here is what I was trying:

df['Prediction']=np.where(df['Sex']=="female",1,np.where((df['Pclass']==1 and df['Age']<18),1,0)

上面的'和'关键字出现了错误。

The above gave an error for 'and' keyword in where.

那么有人可以提供帮助吗?使用np.where(简单的if-else之类)和使用某些函数(applymap等)或修改我之前写的内容的多个方法的解决方案将非常感激。

So can someone help?. Solutions with multiple approache using np.where(simple if-else like) and using some function(applymap etc) or modifications to what I wrote earlier would be really appreciated.

另外我们如何使用df的一些applymap或apply / map方法做同样的事情。

Also how do we do the same using some applymap or apply/map method of df?.

推荐答案

而不是循环遍历行使用 df.iterrows (相对较慢),您可以在一个作业中将所需的值分配给 Prediction 列:

Instead of looping through the rows using df.iterrows (which is relatively slow), you can assign the desired values to the Prediction column in one assignment:

In [27]: df['Prediction'] = ((df['Sex']=='female') | ((df['Pclass']==1) & (df['Age']<18))).astype('int')

In [29]: df['Prediction']
Out[29]: 
0    0
1    1
2    1
3    1
4    0
5    0
6    0
7    0
Name: Prediction, dtype: int32



< hr>

对于您的第一种方法,请记住 df ['Prediction'] 表示整个列df ,所以 df ['Prediction'] = 1 将值1分配给该列中的每一行。由于 df ['Prediction'] = 0 是最后一次分配,整个列最终都被填充为零。


For your first approach, remember that df['Prediction'] represents an entire column of df, so df['Prediction']=1 assigns the value 1 to each row in that column. Since df['Prediction']=0 was the last assignment, the entire column ended up being filled with zeros.

对于第二种方法,请注意您需要使用& 而不是来执行元素两个NumPy阵列或Pandas NDFrame上的逻辑和操作。因此,您可以使用

For your second approach, note that you need to use & not and to perform an elementwise logical-and operation on two NumPy arrays or Pandas NDFrames. Thus, you could use

In [32]: np.where(df['Sex']=='female', 1, np.where((df['Pclass']==1)&(df['Age']<18), 1, 0))
Out[32]: array([0, 1, 1, 1, 0, 0, 0, 0])

虽然我觉得它很多更简单地使用 | 用于逻辑 - 和& 用于逻辑 - 和:

though I think it is much simpler to just use | for logical-or and & for logical-and:

In [34]: ((df['Sex']=='female') | ((df['Pclass']==1) & (df['Age']<18)))
Out[34]: 
0    False
1     True
2     True
3     True
4    False
5    False
6    False
7    False
dtype: bool

这篇关于使用多个If-else的Pandas变量创建的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆