如何根据 pandas 中其他列的条件创建一个新列? [英] How do I create a new column based on conditions of other columns in pandas?

查看:101
本文介绍了如何根据 pandas 中其他列的条件创建一个新列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个熊猫数据框,我想在其中创建一个新列,其值基于其他三个列的值.

I have a pandas dataframe in which I want to create a new columns, which values are based on values of three other columns.

首先创建一个列,并给它一个随机值300:

first a created the column and just gave it a random value 300:

data['stability'] = 300

然后我设置条件:

data['stability'][(data['wind_speed'] <= 3) & (data['clouds'] < 4 ) & (data['dagnacht'] == 'nacht')] = 6
data['stability'][(data['wind_speed'] > 3) & (data['clouds'] < 4 ) & (data['dagnacht'] == 'nacht')] = 5
data['stability'][(data['wind_speed'] >= 5) & (data['clouds'] < 4 ) & (data['dagnacht'] == 'nacht')] = 4
data['stability'][(data['wind_speed'] <= 3) & (data['clouds'] >= 4 ) & (data['dagnacht'] == 'nacht')] = 5
data['stability'][(data['wind_speed'] > 3) & (data['clouds'] >= 4 ) & (data['dagnacht'] == 'nacht')] = 4

如果您检查条件是否存在,则它确实表明条件存在: 输入:

If you check if the condition exist, it does tell that the conditions exist: input:

data['stability'][(data['wind_speed'] > 3) & (data['clouds'] >= 4 ) & (data['dagnacht'] == 'nacht')]

输出:

2011-08-04 21:00:00    300.0
2011-08-04 22:00:00    300.0
2011-08-04 23:00:00    300.0
2011-08-05 00:00:00    300.0
2011-08-05 01:00:00    300.0
2011-08-05 02:00:00    300.0
2011-08-05 03:00:00    300.0
2011-08-05 04:00:00    300.0
2011-08-05 05:00:00    300.0
2011-08-06 23:00:00    300.0
2011-08-07 00:00:00    300.0
2011-08-07 01:00:00    300.0

但是正如您所看到的,它仍然具有我一开始就给它的值300,而不是我现在想要的值4. value_counts给我300作为唯一值 由于某种原因,它可以读取条件,但不会将新值分配给稳定性.

But as you can see it still had the value of 300 I gave it in the beginning, and not the value 4 I want to have it now. value_counts gives me 300 as the only value For some reason it can read to condition but does not assign the new value to stability.

我正在使用python 2.7和pandas 0.18.0

I am working with python 2.7 and pandas 0.18.0

我的数据集如下:

           wind_speed clouds  stability dagnacht
date                                                      
2016-03-21 19:00:00        4.73      7      300.0    nacht
2016-03-21 19:10:00        4.58    NaN      300.0    nacht
2016-03-21 19:20:00        4.75    NaN      300.0    nacht
2016-03-21 19:30:00        3.67    NaN      300.0    nacht
2016-03-21 19:40:00        3.41    NaN      300.0    nacht
2016-03-21 19:50:00        3.61    NaN      300.0    nacht
2016-03-21 20:00:00        3.31      8      300.0    nacht
2016-03-21 20:10:00        3.30    NaN      300.0    nacht
2016-03-21 20:20:00        3.39    NaN      300.0    nacht
2016-03-21 20:30:00        3.59    NaN      300.0    nacht
2016-03-21 20:40:00        3.24    NaN      300.0    nacht
2016-03-21 20:50:00        2.99    NaN      300.0    nacht
2016-03-21 21:00:00        3.04      7      300.0    nacht
2016-03-21 21:10:00        3.01    NaN      300.0    nacht
2016-03-21 21:20:00        2.63    NaN      300.0    nacht
2016-03-21 21:30:00        2.41    NaN      300.0    nacht
2016-03-21 21:40:00        2.42    NaN      300.0    nacht
2016-03-21 21:50:00        2.49    NaN      300.0    nacht
2016-03-21 22:00:00        2.31      8      300.0    nacht
2016-03-21 22:10:00        2.24    NaN      300.0    nacht
2016-03-21 22:20:00        1.89    NaN      300.0    nacht
2016-03-21 22:30:00        1.88    NaN      300.0    nacht
2016-03-21 22:40:00        1.83    NaN      300.0    nacht
2016-03-21 22:50:00        1.83    NaN      300.0    nacht
2016-03-21 23:00:00        1.86      8      300.0    nacht
2016-03-21 23:10:00        2.29    NaN      300.0    nacht
2016-03-21 23:20:00        2.53    NaN      300.0    nacht
2016-03-21 23:30:00        2.36    NaN      300.0    nacht
2016-03-21 23:40:00        2.04    NaN      300.0    nacht
2016-03-21 23:50:00        1.83    NaN      300.0    nacht

预先感谢您的帮助

推荐答案

您正在执行

You're performing chained indexing, change your lines to this form:

data.loc[(data['wind_speed'] > 3) & (data['clouds'] >= 4 ) & (data['dagnacht'] == 'nacht'), 'stability'] = 4

因此,您可以根据数据视图而不是副本视图进行操作

So you operate on a view on your data rather than a copy

In [19]:
data.loc[(data['wind_speed'] > 3) & (data['clouds'] >= 4 ) & (data['dagnacht'] == 'nacht'), 'stability'] = 4
data

Out[19]:
                     wind_speed  clouds  stability dagnacht
date                                                       
2016-03-21 19:00:00        4.73     7.0        4.0    nacht
2016-03-21 19:10:00        4.58     NaN      300.0    nacht
2016-03-21 19:20:00        4.75     NaN      300.0    nacht
2016-03-21 19:30:00        3.67     NaN      300.0    nacht
2016-03-21 19:40:00        3.41     NaN      300.0    nacht
2016-03-21 19:50:00        3.61     NaN      300.0    nacht
2016-03-21 20:00:00        3.31     8.0        4.0    nacht
2016-03-21 20:10:00        3.30     NaN      300.0    nacht
2016-03-21 20:20:00        3.39     NaN      300.0    nacht
2016-03-21 20:30:00        3.59     NaN      300.0    nacht
2016-03-21 20:40:00        3.24     NaN      300.0    nacht
2016-03-21 20:50:00        2.99     NaN      300.0    nacht
2016-03-21 21:00:00        3.04     7.0        4.0    nacht
2016-03-21 21:10:00        3.01     NaN      300.0    nacht
2016-03-21 21:20:00        2.63     NaN      300.0    nacht
2016-03-21 21:30:00        2.41     NaN      300.0    nacht
2016-03-21 21:40:00        2.42     NaN      300.0    nacht
2016-03-21 21:50:00        2.49     NaN      300.0    nacht
2016-03-21 22:00:00        2.31     8.0      300.0    nacht
2016-03-21 22:10:00        2.24     NaN      300.0    nacht
2016-03-21 22:20:00        1.89     NaN      300.0    nacht
2016-03-21 22:30:00        1.88     NaN      300.0    nacht
2016-03-21 22:40:00        1.83     NaN      300.0    nacht
2016-03-21 22:50:00        1.83     NaN      300.0    nacht
2016-03-21 23:00:00        1.86     8.0      300.0    nacht
2016-03-21 23:10:00        2.29     NaN      300.0    nacht
2016-03-21 23:20:00        2.53     NaN      300.0    nacht
2016-03-21 23:30:00        2.36     NaN      300.0    nacht
2016-03-21 23:40:00        2.04     NaN      300.0    nacht
2016-03-21 23:50:00        1.83     NaN      300.0    nacht

如果您尝试使用代码,则应该提出警告:

A warning should've been raised if you tried your code:

In [21]:
data['stability'][(data['wind_speed'] > 3) & (data['clouds'] >= 4 ) & (data['dagnacht'] == 'nacht')] = 4
data

C:\WinPython-64bit-3.4.3.5\python-3.4.3.amd64\lib\site-packages\IPython\kernel\__main__.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':

这篇关于如何根据 pandas 中其他列的条件创建一个新列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆