使用多个If-else的Pandas变量创建 [英] Pandas variable creation using multiple If-else
问题描述
需要Pandas多个IF-ELSE语句的帮助。我有一个测试数据集(泰坦尼克号)如下:
Need help with Pandas multiple IF-ELSE statements. I have a test dataset (titanic) as follows:
ID Survived Pclass Name Sex Age
1 0 3 Braund male 22
2 1 1 Cumings, Mrs. female 38
3 1 3 Heikkinen, Miss. Laina female 26
4 1 1 Futrelle, Mrs. female 35
5 0 3 Allen, Mr. male 35
6 0 3 Moran, Mr. male
7 0 1 McCarthy, Mr. male 54
8 0 3 Palsson, Master male 2
其中Id是乘客ID。我想在此数据框中创建一个新的标志变量,该变量具有以下规则:
where Id is the passenger id. I want to create a new flag variable in this data frame which has the following rule:
if Sex=="female" or (Pclass==1 and Age <18) then 1 else 0.
现在这样做我尝试了一些方法。这就是我首先接近的方式:
Now to do this I tried a few approaches. This is how I approached first:
df=pd.read_csv(data.csv)
for passenger_index,passenger in df.iterrows():
if passenger['Sex']=="female" or (passenger['Pclass']==1 and passenger['Age']<18):
df['Prediction']=1
else:
df['Prediction']=0
上面代码的问题是它在df中创建一个Prediction变量,但是所有值都为0.
The problem with above code is that it creates a Prediction variable in df but with all values as 0.
但是如果我使用相同的代码而是输出它给了一个字典,它给出了正确答案,如下所示:
However if I use the same code but instead output it to a dictionary it gives the right answer as shown below:
prediction={}
df=pd.read_csv(data.csv)
for passenger_index,passenger in df.iterrows():
if passenger['Sex']=="female" or (passenger['Pclass']==1 and passenger['Age']<18):
prediction[passenger['ID']=1
else:
prediction[passenger['ID']=0
这给出了一个dict预测根据上述逻辑,键为ID,值为1或0。
This gives a dict prediction with keys as ID and values as 1 or 0 based on the above logic.
那么为什么df变量工作错误?我甚至尝试先定义一个函数然后调用它。和第一个一样。
So why the df variable works wrongly?. I even tried by first defining a function and then calling it. Gave the same ans as first.
那么,我们怎么能在熊猫中做到这一点?
So, how can we do this in pandas?.
其次,如果我们可以使用多个if-else语句,我想也可以这样做。我知道np.where但它不允许添加'和'条件。所以这就是我的尝试:
Secondly, I guess the same can be done if we can just use some multiple if-else statements. I know np.where but it is not allowing to add 'and' condition. So here is what I was trying:
df['Prediction']=np.where(df['Sex']=="female",1,np.where((df['Pclass']==1 and df['Age']<18),1,0)
上面的'和'关键字出现了错误。
The above gave an error for 'and' keyword in where.
那么有人可以提供帮助吗?使用np.where(简单的if-else之类)和使用某些函数(applymap等)或修改我之前写的内容的多个方法的解决方案将非常感激。
So can someone help?. Solutions with multiple approache using np.where(simple if-else like) and using some function(applymap etc) or modifications to what I wrote earlier would be really appreciated.
另外我们如何使用df的一些applymap或apply / map方法做同样的事情。
Also how do we do the same using some applymap or apply/map method of df?.
推荐答案
而不是循环遍历行使用 df.iterrows
(相对较慢),您可以在一个作业中将所需的值分配给 Prediction
列:
Instead of looping through the rows using df.iterrows
(which is relatively slow), you can assign the desired values to the Prediction
column in one assignment:
In [27]: df['Prediction'] = ((df['Sex']=='female') | ((df['Pclass']==1) & (df['Age']<18))).astype('int')
In [29]: df['Prediction']
Out[29]:
0 0
1 1
2 1
3 1
4 0
5 0
6 0
7 0
Name: Prediction, dtype: int32
< hr>
对于您的第一种方法,请记住 df ['Prediction']
表示整个列df
,所以 df ['Prediction'] = 1
将值1分配给该列中的每一行。由于 df ['Prediction'] = 0
是最后一次分配,整个列最终都被填充为零。
For your first approach, remember that df['Prediction']
represents an entire column of df
, so df['Prediction']=1
assigns the value 1 to each row in that column. Since df['Prediction']=0
was the last assignment, the entire column ended up being filled with zeros.
对于第二种方法,请注意您需要使用&
而不是和
来执行元素两个NumPy阵列或Pandas NDFrame上的逻辑和操作。因此,您可以使用
For your second approach, note that you need to use &
not and
to perform an elementwise logical-and operation on two NumPy arrays or Pandas NDFrames. Thus, you could use
In [32]: np.where(df['Sex']=='female', 1, np.where((df['Pclass']==1)&(df['Age']<18), 1, 0))
Out[32]: array([0, 1, 1, 1, 0, 0, 0, 0])
虽然我觉得它很多更简单地使用 |
用于逻辑 - 和&
用于逻辑 - 和:
though I think it is much simpler to just use |
for logical-or and &
for logical-and:
In [34]: ((df['Sex']=='female') | ((df['Pclass']==1) & (df['Age']<18)))
Out[34]:
0 False
1 True
2 True
3 True
4 False
5 False
6 False
7 False
dtype: bool
这篇关于使用多个If-else的Pandas变量创建的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!