在 Pandas/Python 中使用 loc 和仅使用方括号过滤列有什么区别? [英] What is the difference between using loc and using just square brackets to filter for columns in Pandas/Python?

查看:42
本文介绍了在 Pandas/Python 中使用 loc 和仅使用方括号过滤列有什么区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我注意到在 Pandas DataFrame 中选择列的三种方法:

I've noticed three methods of selecting a column in a Pandas DataFrame:

第一种使用 loc 选择列的方法:

df_new = df.loc[:, 'col1']

第二种方法 - 看起来更简单更快:

df_new = df['col1']

第三种方法 - 最方便:

df_new = df.col1

这三种方法有区别吗?我不这么认为,在这种情况下,我宁愿使用第三种方法.

Is there a difference between these three methods? I don't think so, in which case I'd rather use the third method.

我很好奇为什么似乎有三种方法可以做同样的事情.

I'm mostly curious as to why there appear to be three methods for doing the same thing.

推荐答案

在以下情况下,它们的行为相同:

In the following situations, they behave the same:

  1. 选择单列(df['A']df.loc[:, 'A'] -> 选择 A 列)
  2. 选择列列表 (df[['A', 'B', 'C']]df.loc[:, ['A','B', 'C']] -> 选择列 A、B 和 C)
  3. 按行切片(df[1:3]df.iloc[1:3] 相同 -> 选择第 1 行和第 2 行.注意,但是,如果您使用 loc 而不是 iloc 对行进行切片,假设您有 RangeIndex.查看详情此处.)
  1. Selecting a single column (df['A'] is the same as df.loc[:, 'A'] -> selects column A)
  2. Selecting a list of columns (df[['A', 'B', 'C']] is the same as df.loc[:, ['A', 'B', 'C']] -> selects columns A, B and C)
  3. Slicing by rows (df[1:3] is the same as df.iloc[1:3] -> selects rows 1 and 2. Note, however, if you slice rows with loc, instead of iloc, you'll get rows 1, 2 and 3 assuming you have a RangeIndex. See details here.)

但是,[] 在以下情况下不起作用:

However, [] does not work in the following situations:

  1. 您可以使用 df.loc[row_label]
  2. 选择单行
  3. 您可以使用 df.loc[[row_label1, row_label2]]
  4. 选择行列表
  5. 您可以使用 df.loc[:, 'A':'C']
  6. 对列进行切片
  1. You can select a single row with df.loc[row_label]
  2. You can select a list of rows with df.loc[[row_label1, row_label2]]
  3. You can slice columns with df.loc[:, 'A':'C']

这三个不能用[]来完成.更重要的是,如果您的选择同时涉及行和列,那么分配就会出现问题.

These three cannot be done with []. More importantly, if your selection involves both rows and columns, then assignment becomes problematic.

df[1:3]['A'] = 5

这将选择第 1 行和第 2 行,然后选择返回对象的A"列并为其分配值 5.问题是,返回的对象可能是一个副本,因此这可能不会更改实际的 DataFrame.这引发了 SettingWithCopyWarning.进行此分配的正确方法是:

This selects rows 1 and 2 then selects column 'A' of the returning object and assigns value 5 to it. The problem is, the returning object might be a copy so this may not change the actual DataFrame. This raises SettingWithCopyWarning. The correct way of making this assignment is:

df.loc[1:3, 'A'] = 5

使用.loc,您可以保证修改原始DataFrame.它还允许您对列进行切片(df.loc[:, 'C':'F']),选择单行(df.loc[5]),然后选择一个行列表 (df.loc[[1, 2, 5]]).

With .loc, you are guaranteed to modify the original DataFrame. It also allows you to slice columns (df.loc[:, 'C':'F']), select a single row (df.loc[5]), and select a list of rows (df.loc[[1, 2, 5]]).

另请注意,这两者并未同时包含在 API 中..loc 是后来添加的,作为一个更强大和更明确的索引器.有关详细信息,请参阅 unutbu 的回答.

Also note that these two were not included in the API at the same time. .loc was added much later as a more powerful and explicit indexer. See unutbu's answer for more detail.

注意:使用 []. 获取列是一个完全不同的主题.. 只是为了方便.它只允许访问名称为有效 Python 标识符的列(即它们不能包含空格,它们不能由数字组成......).当名称与 Series/DataFrame 方法冲突时不能使用.它也不能用于不存在的列(即,如果没有列 a,则赋值 df.a = 1 将不起作用).除此之外,.[] 是一样的.

Note: Getting columns with [] vs . is a completely different topic. . is only there for convenience. It only allows accessing columns whose names are valid Python identifiers (i.e. they cannot contain spaces, they cannot be composed of numbers...). It cannot be used when the names conflict with Series/DataFrame methods. It also cannot be used for non-existing columns (i.e. the assignment df.a = 1 won't work if there is no column a). Other than that, . and [] are the same.

这篇关于在 Pandas/Python 中使用 loc 和仅使用方括号过滤列有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆