了解 pandas 数据框索引 [英] Understanding pandas dataframe indexing

查看:66
本文介绍了了解 pandas 数据框索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

摘要: 这不起作用:

df[df.key==1]['D'] = 1

但是这样做:

df.D[df.key==1] = 1

为什么?

复制:

In [1]: import pandas as pd

In [2]: from numpy.random import randn

In [4]: df = pd.DataFrame(randn(6,3),columns=list('ABC'))

In [5]: df
Out[5]: 
          A         B         C
0  1.438161 -0.210454 -1.983704
1 -0.283780 -0.371773  0.017580
2  0.552564 -0.610548  0.257276
3  1.931332  0.649179 -1.349062
4  1.656010 -1.373263  1.333079
5  0.944862 -0.657849  1.526811

In [6]: df['D']=0.0

In [7]: df['key']=3*[1]+3*[2]

In [8]: df
Out[8]: 
          A         B         C  D  key
0  1.438161 -0.210454 -1.983704  0    1
1 -0.283780 -0.371773  0.017580  0    1
2  0.552564 -0.610548  0.257276  0    1
3  1.931332  0.649179 -1.349062  0    2
4  1.656010 -1.373263  1.333079  0    2
5  0.944862 -0.657849  1.526811  0    2

这不起作用:

In [9]: df[df.key==1]['D'] = 1

In [10]: df
Out[10]: 
          A         B         C  D  key
0  1.438161 -0.210454 -1.983704  0    1
1 -0.283780 -0.371773  0.017580  0    1
2  0.552564 -0.610548  0.257276  0    1
3  1.931332  0.649179 -1.349062  0    2
4  1.656010 -1.373263  1.333079  0    2
5  0.944862 -0.657849  1.526811  0    2

但是这样做:

In [11]: df.D[df.key==1] = 3.4

In [12]: df
Out[12]: 
          A         B         C    D  key
0  1.438161 -0.210454 -1.983704  3.4    1
1 -0.283780 -0.371773  0.017580  3.4    1
2  0.552564 -0.610548  0.257276  3.4    1
3  1.931332  0.649179 -1.349062  0.0    2
4  1.656010 -1.373263  1.333079  0.0    2
5  0.944862 -0.657849  1.526811  0.0    2

链接到笔记本

我的问题是:

为什么只有第二种方法起作用?我似乎看不到选择/索引逻辑的差异.

Why does only the 2nd way work? I can't seem to see a difference in selection/indexing logic.

版本为0.10.0

不应再像这样进行此操作.从版本0.11开始,存在.loc.参见此处: http://pandas.pydata.org/pandas-docs/stable /indexing.html

推荐答案

熊猫文档说:

返回视图而不是副本

Returning a view versus a copy

关于何时返回数据视图的规则完全是 取决于NumPy.每当标签数组或布尔向量 参与索引操作,结果将是副本. 使用单标签/标量索引和切片,例如df.ix [3:6]或 df.ix [:,'A'],将返回一个视图.

The rules about when a view on the data is returned are entirely dependent on NumPy. Whenever an array of labels or a boolean vector are involved in the indexing operation, the result will be a copy. With single label / scalar indexing and slicing, e.g. df.ix[3:6] or df.ix[:, 'A'], a view will be returned.

df[df.key==1]['D']中,您首先进行布尔切片(导致数据框的副本),然后选择列['D'].

In df[df.key==1]['D'] you first do boolean slicing (leading to a copy of the Dataframe), then you choose a column ['D'].

df.D[df.key==1] = 3.4中,您首先选择一列,然后对生成的系列进行布尔切片.

In df.D[df.key==1] = 3.4, you first choose a column, then do boolean slicing on the resulting Series.

这似乎有所不同,尽管我必须承认这有点违反直觉.

This seems to make the difference, although I must admit that it is a little counterintuitive.

编辑 :区别是由Dougal标识的,请参见他的评论:对于版本1,将复制__getitem__方法,以进行布尔切片.对于版本2,仅访问__setitem__方法-因此不返回副本而是仅进行分配.

Edit: The difference was identified by Dougal, see his comment: With version 1, the copy is made as the __getitem__ method is called for the boolean slicing. For version 2, only the __setitem__ method is accessed - thus not returning a copy but just assigning.

这篇关于了解 pandas 数据框索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆