在 Pandas 数据框中选择多列 [英] Selecting multiple columns in a Pandas dataframe

查看:40
本文介绍了在 Pandas 数据框中选择多列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在不同的列中有数据,但我不知道如何提取它以将其保存在另一个变量中.

I have data in different columns, but I don't know how to extract it to save it in another variable.

index  a   b   c
1      2   3   4
2      3   4   5

如何选择'a''b'并将其保存到df1中?

How do I select 'a', 'b' and save it in to df1?

我试过了

df1 = df['a':'b']
df1 = df.ix[:, 'a':'b']

似乎都不起作用.

推荐答案

不能按照您尝试的方式对列名称(它们是字符串)进行切片.

The column names (which are strings) cannot be sliced in the manner you tried.

这里有几个选项.如果您从上下文中知道要切出哪些变量,则可以通过将列表传递到 __getitem__ 语法([] 的).

Here you have a couple of options. If you know from context which variables you want to slice out, you can just return a view of only those columns by passing a list into the __getitem__ syntax (the []'s).

df1 = df[['a', 'b']]

或者,如果用数字索引而不是按名称索引很重要(假设您的代码应该在不知道前两列名称的情况下自动执行此操作),那么您可以改为这样做:

Alternatively, if it matters to index them numerically and not by their name (say your code should automatically do this without knowing the names of the first two columns) then you can do this instead:

df1 = df.iloc[:, 0:2] # Remember that Python does not slice inclusive of the ending index.

此外,您应该熟悉 Pandas 对象的视图与该对象的副本的概念.上述第一个方法将在内存中返回所需子对象(所需切片)的新副本.

Additionally, you should familiarize yourself with the idea of a view into a Pandas object vs. a copy of that object. The first of the above methods will return a new copy in memory of the desired sub-object (the desired slices).

然而,有时 Pandas 中的索引约定不会这样做,而是为您提供一个新变量,该变量仅引用与原始对象中的子对象或切片相同的内存块.第二种索引方式会发生这种情况,因此您可以使用 .copy() 方法对其进行修改以获取常规副本.发生这种情况时,更改您认为是切片对象的内容有时会改变原始对象.总是很高兴注意这一点.

Sometimes, however, there are indexing conventions in Pandas that don't do this and instead give you a new variable that just refers to the same chunk of memory as the sub-object or slice in the original object. This will happen with the second way of indexing, so you can modify it with the .copy() method to get a regular copy. When this happens, changing what you think is the sliced object can sometimes alter the original object. Always good to be on the look out for this.

df1 = df.iloc[0, 0:2].copy() # To avoid the case where changing df1 also changes df

要使用 iloc,您需要知道列位置(或索引).由于列位置可能会改变,而不是硬编码索引,您可以使用 iloc 以及数据帧对象的 columns 方法的 get_loc 函数来获取列索引.

To use iloc, you need to know the column positions (or indices). As the column positions may change, instead of hard-coding indices, you can use iloc along with get_loc function of columns method of dataframe object to obtain column indices.

{df.columns.get_loc(c): c for idx, c in enumerate(df.columns)}

现在您可以使用此字典通过名称和使用 iloc 访问列.

Now you can use this dictionary to access columns through names and using iloc.

这篇关于在 Pandas 数据框中选择多列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆