如何在 pandas 中获取数据框的列切片 [英] How to take column-slices of dataframe in pandas

查看:70
本文介绍了如何在 pandas 中获取数据框的列切片的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从CSV文件加载了一些机器学习数据.前两列是观察值,其余两列是要素.

I load some machine learning data from a CSV file. The first 2 columns are observations and the remaining columns are features.

当前,我执行以下操作:

Currently, I do the following:

data = pandas.read_csv('mydata.csv')

其内容类似于:

data = pandas.DataFrame(np.random.rand(10,5), columns = list('abcde'))

我想将此数据帧切成两个数据帧:一个包含列ab,另一个包含列cde.

I'd like to slice this dataframe in two dataframes: one containing the columns a and b and one containing the columns c, d and e.

不可能写出

observations = data[:'c']
features = data['c':]

我不确定什么是最好的方法.我需要pd.Panel吗?

I'm not sure what the best method is. Do I need a pd.Panel?

顺便说一句,我发现数据帧索引非常不一致:允许使用data['a'],但是不允许使用data[0].另一方面,不允许data['a':],但允许data[0:]. 是否有实际原因?鉴于data[0] != data[0:1]

By the way, I find dataframe indexing pretty inconsistent: data['a'] is permitted, but data[0] is not. On the other side, data['a':] is not permitted but data[0:] is. Is there a practical reason for this? This is really confusing if columns are indexed by Int, given that data[0] != data[0:1]

推荐答案

2017答案-熊猫0.20:.ix已弃用.使用.loc

请参阅文档中的不推荐使用

.loc使用基于标签的索引来选择行和列.标签是索引或列的值.用.loc切片包括最后一个元素.

.loc uses label based indexing to select both rows and columns. The labels being the values of the index or the columns. Slicing with .loc includes the last element.

假设我们有一个包含以下列的DataFrame:
foobarquzantcatsatdat.

Let's assume we have a DataFrame with the following columns:
foo, bar, quz, ant, cat, sat, dat.

# selects all rows and all columns beginning at 'foo' up to and including 'sat'
df.loc[:, 'foo':'sat']
# foo bar quz ant cat sat

.loc接受与Python列表对行和列所做的相同的切片表示法.切片符号为start:stop:step

.loc accepts the same slice notation that Python lists do for both row and columns. Slice notation being start:stop:step

# slice from 'foo' to 'cat' by every 2nd column
df.loc[:, 'foo':'cat':2]
# foo quz cat

# slice from the beginning to 'bar'
df.loc[:, :'bar']
# foo bar

# slice from 'quz' to the end by 3
df.loc[:, 'quz'::3]
# quz sat

# attempt from 'sat' to 'bar'
df.loc[:, 'sat':'bar']
# no columns returned

# slice from 'sat' to 'bar'
df.loc[:, 'sat':'bar':-1]
sat cat ant quz bar

# slice notation is syntatic sugar for the slice function
# slice from 'quz' to the end by 2 with slice function
df.loc[:, slice('quz',None, 2)]
# quz cat dat

# select specific columns with a list
# select columns foo, bar and dat
df.loc[:, ['foo','bar','dat']]
# foo bar dat

您可以按行和列进行切片.例如,如果您有5行带有标签vwxyz

You can slice by rows and columns. For instance, if you have 5 rows with labels v, w, x, y, z

# slice from 'w' to 'y' and 'foo' to 'ant' by 3
df.loc['w':'y', 'foo':'ant':3]
#    foo ant
# w
# x
# y

这篇关于如何在 pandas 中获取数据框的列切片的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆