Python Pandas-"loc"和"where"之间的区别? [英] Python Pandas - difference between 'loc' and 'where'?
问题描述
只是对"where"的行为以及为什么要在"loc"上使用它感到好奇.
Just curious on the behavior of 'where' and why you would use it over 'loc'.
如果我创建一个数据框:
If I create a dataframe:
df = pd.DataFrame({'ID':[1,2,3,4,5,6,7,8,9,10],
'Run Distance':[234,35,77,787,243,5435,775,123,355,123],
'Goals':[12,23,56,7,8,0,4,2,1,34],
'Gender':['m','m','m','f','f','m','f','m','f','m']})
然后应用"where"功能:
And then apply the 'where' function:
df2 = df.where(df['Goals']>10)
我得到以下内容,该结果会过滤出目标"> 10的结果,但将其他所有结果都保留为NaN:
I get the following which filters out the results where Goals > 10, but leaves everything else as NaN:
Gender Goals ID Run Distance
0 m 12.0 1.0 234.0
1 m 23.0 2.0 35.0
2 m 56.0 3.0 77.0
3 NaN NaN NaN NaN
4 NaN NaN NaN NaN
5 NaN NaN NaN NaN
6 NaN NaN NaN NaN
7 NaN NaN NaN NaN
8 NaN NaN NaN NaN
9 m 34.0 10.0 123.0
但是,如果我使用"loc"功能:
If however I use the 'loc' function:
df2 = df.loc[df['Goals']>10]
它返回不包含NaN值的子集的数据帧:
It returns the dataframe subsetted without the NaN values:
Gender Goals ID Run Distance
0 m 12 1 234
1 m 23 2 35
2 m 56 3 77
9 m 34 10 123
因此,从本质上讲,我很好奇为什么您会在'loc/iloc'上使用'where'以及为什么它返回NaN值?
So essentially I am curious why you would use 'where' over 'loc/iloc' and why it returns NaN values?
推荐答案
将loc
视为过滤器-只给我满足条件的df部分.
Think of loc
as a filter - give me only the parts of the df that conform to a condition.
where
最初来自numpy.它在数组上运行,并检查每个元素是否符合条件.因此,它可以带您返回整个数组,并带有结果或NaN
. where
的一个不错的功能是您还可以找回不同的东西,例如df2 = df.where(df['Goals']>10, other='0')
,将不满足条件的值替换为0.
where
originally comes from numpy. It runs over an array and checks if each element fits a condition. So it gives you back the entire array, with a result or NaN
. A nice feature of where
is that you can also get back something different, e.g. df2 = df.where(df['Goals']>10, other='0')
, to replace values that don't meet the condition with 0.
ID Run Distance Goals Gender
0 1 234 12 m
1 2 35 23 m
2 3 77 56 m
3 0 0 0 0
4 0 0 0 0
5 0 0 0 0
6 0 0 0 0
7 0 0 0 0
8 0 0 0 0
9 10 123 34 m
此外,虽然where
仅用于条件过滤,但loc
是熊猫和iloc
一起选择的标准方法. loc
使用行和列名称,而iloc
使用其索引号.因此,使用loc
,您可以选择返回,例如df.loc[0:1, ['Gender', 'Goals']]
:
Also, while where
is only for conditional filtering, loc
is the standard way of selecting in Pandas, along with iloc
. loc
uses row and column names, while iloc
uses their index number. So with loc
you could choose to return, say, df.loc[0:1, ['Gender', 'Goals']]
:
Gender Goals
0 m 12
1 m 23
这篇关于Python Pandas-"loc"和"where"之间的区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!