Python:Pandas系列-为什么使用loc? [英] Python: Pandas Series - Why use loc?

查看：498 发布时间：2020/5/23 21:17:08 python pandas series loc

本文介绍了Python:Pandas系列-为什么使用loc?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

为什么我们对熊猫数据框使用"loc"?似乎下面的代码在使用或不使用loc的情况下都以模拟速度编译并运行

Why do we use 'loc' for pandas dataframes? it seems the following code with or without using loc both compile anr run at a simulular speed

%timeit df_user1 = df.loc[df.user_id=='5561']

100 loops, best of 3: 11.9 ms per loop

或

%timeit df_user1_noloc = df[df.user_id=='5561']

100 loops, best of 3: 12 ms per loop

那为什么要使用loc?

So why use loc?

编辑:该问题已被标记为重复问题.但是，尽管 pandas iloc vs ix vs loc的解释?确实提到了*

This has been flagged as a duplicate question. But although pandas iloc vs ix vs loc explanation? does mention that *

您可以仅通过使用数据框的列来进行列检索 getitem :

you can do column retrieval just by using the data frame's getitem:

df['time']    # equivalent to df.loc[:, 'time']

它没有说明我们为什么使用loc，尽管它解释了loc的许多功能，但我的具体问题是为什么不完全省略loc"?为此，我在下面接受了非常详细的答案.

it does not say why we use loc, although it does explain lots of features of loc, my specific question is 'why not just omit loc altogether'? for which i have accepted a very detailed answer below.

另外，其他帖子的答案(我认为不是答案)在讨论中非常隐蔽，任何搜索我正在寻找的人的人都将很难找到信息，并且可以更好地为您提供信息我的问题的答案.

Also that other post the answer (which i do not think is an answer) is very hidden in the discussion and any person searching for what i was looking for would find it hard to locate the information and would be much better served by the answer provided to my question.

推荐答案

显式优于隐式.
- Explicit is better than implicit.
  
  df[boolean_mask]选择其中boolean_mask为True的行，但是在某些情况下您可能不希望它出现:df具有布尔值的列标签:
  
  df[boolean_mask] selects rows where boolean_mask is True, but there is a corner case when you might not want it to: when df has boolean-valued column labels:
```
In [229]: df = pd.DataFrame({True:[1,2,3],False:[3,4,5]}); df
Out[229]: 
   False  True 
0      3      1
1      4      2
2      5      3
```
  您可能要使用df[[True]]选择True列.相反，它会引发一个ValueError:
  
  You might want to use df[[True]] to select the True column. Instead it raises a ValueError:
```
In [230]: df[[True]]
ValueError: Item wrong length 1 instead of 3.
```
  相对于使用loc:
```
In [231]: df.loc[[True]]
Out[231]: 
   False  True 
0      3      1
```
  相反，即使df2的结构与上面的df1几乎相同，以下内容也不会引起ValueError的出现:
  
  In contrast, the following does not raise ValueError even though the structure of df2 is almost the same as df1 above:
```
In [258]: df2 = pd.DataFrame({'A':[1,2,3],'B':[3,4,5]}); df2
Out[258]: 
   A  B
0  1  3
1  2  4
2  3  5

In [259]: df2[['B']]
Out[259]: 
   B
0  3
1  4
2  5
```
  因此，df[boolean_mask]并不总是与df.loc[boolean_mask]相同.即使这可以说是不太可能的用例，我还是建议始终使用df.loc[boolean_mask]而不是df[boolean_mask]，因为df.loc语法的含义是明确的.使用df.loc[indexer]，您会自动知道df.loc正在选择行.相反，不清楚df[indexer]是否在不了解indexer和df的详细信息的情况下选择行或列(或提高ValueError).
  
  Thus, df[boolean_mask] does not always behave the same as df.loc[boolean_mask]. Even though this is arguably an unlikely use case, I would recommend always using df.loc[boolean_mask] instead of df[boolean_mask] because the meaning of df.loc's syntax is explicit. With df.loc[indexer] you know automatically that df.loc is selecting rows. In contrast, it is not clear if df[indexer] will select rows or columns (or raise ValueError) without knowing details about indexer and df.
  
  df.loc[row_indexer, column_index]可以选择和行. df[indexer]只能根据indexer中的值类型和df所具有的列值类型来选择行或列(同样，它们是否为布尔值?).
  
  df.loc[row_indexer, column_index] can select rows and columns. df[indexer] can only select rows or columns depending on the type of values in indexer and the type of column values df has (again, are they boolean?).
```
In [237]: df2.loc[[True,False,True], 'B']
Out[237]: 
0    3
2    5
Name: B, dtype: int64
```
- 将切片传递到df.loc时，端点包括在范围内.将切片传递给df[...]时，该切片将被解释为半开间隔:
- When a slice is passed to df.loc the end-points are included in the range. When a slice is passed to df[...], the slice is interpreted as a half-open interval:
```
In [239]: df2.loc[1:2]
Out[239]: 
   A  B
1  2  4
2  3  5

In [271]: df2[1:2]
Out[271]: 
   A  B
1  2  4
```

查看全文

Python:Pandas系列-为什么使用loc? [英] Python: Pandas Series - Why use loc?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python:Pandas系列-为什么使用loc? [英] Python: Pandas Series - Why use loc?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭