使用loc和仅使用方括号来过滤Pandas/Python中的列有什么区别? [英] What is the difference between using loc and using just square brackets to filter for columns in Pandas/Python?

查看:107
本文介绍了使用loc和仅使用方括号来过滤Pandas/Python中的列有什么区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我注意到在Pandas DataFrame中选择列的三种方法:

I've noticed three methods of selecting a column in a Pandas DataFrame:

使用loc选择列的第一种方法:

df_new = df.loc[:, 'col1']

第二种方法-似乎更简单,更快捷:

df_new = df['col1']

第三种方法-最方便的方法:

df_new = df.col1

这三种方法之间有区别吗?我不这么认为,在这种情况下,我宁愿使用第三种方法.

Is there a difference between these three methods? I don't think so, in which case I'd rather use the third method.

我最奇怪的是为什么会有三种方法来做同一件事.

I'm mostly curious as to why there appear to be three methods for doing the same thing.

推荐答案

在以下情况下,它们的行为相同:

In the following situations, they behave the same:

  1. 选择单个列(df['A']df.loc[:, 'A']->选择列A相同)
  2. 选择列列表(df[['A', 'B', 'C']]df.loc[:, ['A', 'B', 'C']]->选择列A,B和C)
  3. 按行切片(df[1:3]df.iloc[1:3]->相同,选择行1和2.但是请注意,如果使用loc而不是iloc切片行,则将获得行1 ,2和3(假设您具有RandeIndex.请在此处.)
  1. Selecting a single column (df['A'] is the same as df.loc[:, 'A'] -> selects column A)
  2. Selecting a list of columns (df[['A', 'B', 'C']] is the same as df.loc[:, ['A', 'B', 'C']] -> selects columns A, B and C)
  3. Slicing by rows (df[1:3] is the same as df.iloc[1:3] -> selects rows 1 and 2. Note, however, if you slice rows with loc, instead of iloc, you'll get rows 1, 2 and 3 assuming you have a RandeIndex. See details here.)

但是,[]在以下情况下不起作用:

However, [] does not work in the following situations:

  1. 您可以使用df.loc[row_label]
  2. 选择一行
  3. 您可以使用df.loc[[row_label1, row_label2]]
  4. 选择行列表
  5. 您可以使用df.loc[:, 'A':'C']
  6. 对列进行切片
  1. You can select a single row with df.loc[row_label]
  2. You can select a list of rows with df.loc[[row_label1, row_label2]]
  3. You can slice columns with df.loc[:, 'A':'C']

这三个不能用[]完成. 更重要的是,如果您的选择同时涉及到行和列,那么分配就会成问题.

These three cannot be done with []. More importantly, if your selection involves both rows and columns, then assignment becomes problematic.

df[1:3]['A'] = 5

这将选择第1行和第2行,然后选择返回对象的列"A"并为其分配值5.问题是,返回的对象可能是副本,因此这可能不会更改实际的DataFrame.这样会产生 SettingWithCopyWarning .这种分配的正确方法是

This selects rows 1 and 2, and then selects column 'A' of the returning object and assign value 5 to it. The problem is, the returning object might be a copy so this may not change the actual DataFrame. This raises SettingWithCopyWarning. The correct way of this assignment is

df.loc[1:3, 'A'] = 5

使用.loc,可以保证修改原始DataFrame.它还允许您对列(df.loc[:, 'C':'F'])进行切片,选择单行(df.loc[5])和选择行列表(df.loc[[1, 2, 5]]).

With .loc, you are guaranteed to modify the original DataFrame. It also allows you to slice columns (df.loc[:, 'C':'F']), select a single row (df.loc[5]), and select a list of rows (df.loc[[1, 2, 5]]).

还请注意,这两个API未同时包含在API中. .loc作为更强大和更明确的索引器被添加得多了.有关更多详细信息,请参见 unutbu的答案.

Also note that these two were not included in the API at the same time. .loc was added much later as a more powerful and explicit indexer. See unutbu's answer for more detail.

注意:使用[].获取列是一个完全不同的主题. .仅在此处提供.它仅允许访问其名称为有效Python标识符的列(即它们不能包含空格,它们不能由数字组成...).当名称与Series/DataFrame方法冲突时,不能使用它.它也不能用于不存在的列(即,如果没有列a,则分配df.a = 1将不起作用).除此之外,.[]相同.

Note: Getting columns with [] vs . is a completely different topic. . is only there for convenince. It only allows accessing columns whose name are valid Python identifier (i.e. they cannot contain spaces, they cannot be composed of numbers...). It cannot be used when the names conflict with Series/DataFrame methods. It also cannot be used for non-existing columns (i.e. the assignment df.a = 1 won't work if there is no column a). Other than that, . and [] are the same.

这篇关于使用loc和仅使用方括号来过滤Pandas/Python中的列有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆