Python Pandas-"loc"和"where"之间的区别? [英] Python Pandas - difference between 'loc' and 'where'?

查看:491
本文介绍了Python Pandas-"loc"和"where"之间的区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

只是对"where"的行为以及为什么要在"loc"上使用它感到好奇.

Just curious on the behavior of 'where' and why you would use it over 'loc'.

如果我创建一个数据框:

If I create a dataframe:

df = pd.DataFrame({'ID':[1,2,3,4,5,6,7,8,9,10], 
                   'Run Distance':[234,35,77,787,243,5435,775,123,355,123],
                   'Goals':[12,23,56,7,8,0,4,2,1,34],
                   'Gender':['m','m','m','f','f','m','f','m','f','m']})

然后应用"where"功能:

And then apply the 'where' function:

df2 = df.where(df['Goals']>10)

我得到以下内容,该结果会过滤出目标"> 10的结果,但将其他所有结果都保留为NaN:

I get the following which filters out the results where Goals > 10, but leaves everything else as NaN:

  Gender  Goals    ID  Run Distance                                                                                                                                                  
0      m   12.0   1.0         234.0                                                                                                                                                  
1      m   23.0   2.0          35.0                                                                                                                                                  
2      m   56.0   3.0          77.0                                                                                                                                                  
3    NaN    NaN   NaN           NaN                                                                                                                                                  
4    NaN    NaN   NaN           NaN                                                                                                                                                  
5    NaN    NaN   NaN           NaN                                                                                                                                                  
6    NaN    NaN   NaN           NaN                                                                                                                                                  
7    NaN    NaN   NaN           NaN                                                                                                                                                  
8    NaN    NaN   NaN           NaN                                                                                                                                                  
9      m   34.0  10.0         123.0  

但是,如果我使用"loc"功能:

If however I use the 'loc' function:

df2 = df.loc[df['Goals']>10]

它返回不包含NaN值的子集的数据帧:

It returns the dataframe subsetted without the NaN values:

  Gender  Goals  ID  Run Distance                                                                                                                                                    
0      m     12   1           234                                                                                                                                                    
1      m     23   2            35                                                                                                                                                    
2      m     56   3            77                                                                                                                                                    
9      m     34  10           123 

因此,从本质上讲,我很好奇为什么您会在'loc/iloc'上使用'where'以及为什么它返回NaN值?

So essentially I am curious why you would use 'where' over 'loc/iloc' and why it returns NaN values?

推荐答案

loc视为过滤器-只给我满足条件的df部分.

Think of loc as a filter - give me only the parts of the df that conform to a condition.

where最初来自numpy.它在数组上运行,并检查每个元素是否符合条件.因此,它可以带您返回整个数组,并带有结果或NaN. where的一个不错的功能是您还可以找回不同的东西,例如df2 = df.where(df['Goals']>10, other='0'),将不满足条件的值替换为0.

where originally comes from numpy. It runs over an array and checks if each element fits a condition. So it gives you back the entire array, with a result or NaN. A nice feature of where is that you can also get back something different, e.g. df2 = df.where(df['Goals']>10, other='0'), to replace values that don't meet the condition with 0.

ID  Run Distance Goals Gender
0   1   234      12     m
1   2   35       23     m
2   3   77       56     m
3   0   0        0      0
4   0   0        0      0
5   0   0        0      0
6   0   0        0      0
7   0   0        0      0
8   0   0        0      0
9   10  123      34     m

此外,虽然where仅用于条件过滤,但loc是熊猫和iloc一起选择的标准方法. loc使用行和列名称,而iloc使用其索引号.因此,使用loc,您可以选择返回,例如df.loc[0:1, ['Gender', 'Goals']]:

Also, while where is only for conditional filtering, loc is the standard way of selecting in Pandas, along with iloc. loc uses row and column names, while iloc uses their index number. So with loc you could choose to return, say, df.loc[0:1, ['Gender', 'Goals']]:

    Gender  Goals
0   m   12
1   m   23

这篇关于Python Pandas-"loc"和"where"之间的区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆