使用索引编辑pandas DataFrame [英] Edit pandas DataFrame using indexes

查看:81
本文介绍了使用索引编辑pandas DataFrame的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否有一种通用的,有效的方式将值分配给熊猫中的DataFrame的子集?我有成百上千的行和列可以直接访问,但是我没有设法弄清楚如何在不遍历每一行col对的情况下编辑它们的值.例如:

Is there a general, efficient way to assign values to a subset of a DataFrame in pandas? I've got hundreds of rows and columns that I can access directly but I haven't managed to figure out how to edit their values without iterating through each row,col pair. For example:

In [1]: import pandas, numpy

In [2]: array = numpy.arange(30).reshape(3,10)

In [3]: df = pandas.DataFrame(array, index=list("ABC"))

In [4]: df
Out[4]: 
    0   1   2   3   4   5   6   7   8   9
A   0   1   2   3   4   5   6   7   8   9
B  10  11  12  13  14  15  16  17  18  19
C  20  21  22  23  24  25  26  27  28  29

In [5]: rows = ['A','C']

In [6]: columns = [1,4,7]

In [7]: df[columns].ix[rows]
Out[7]: 
    1   4   7
A   1   4   7
C  21  24  27

In [8]: df[columns].ix[rows] = 900

In [9]: df
Out[9]: 
    0   1   2   3   4   5   6   7   8   9
A   0   1   2   3   4   5   6   7   8   9
B  10  11  12  13  14  15  16  17  18  19
C  20  21  22  23  24  25  26  27  28  29

我相信这里发生的是我得到的是副本而不是视图,这意味着我无法分配给原始DataFrame.那是我的问题吗?编辑这些行x列的最有效方法是什么(最好是渐进式,因为DataFrame可能会占用大量内存)?

I believe what is happening here is that I'm getting a copy rather than a view, meaning I can't assign to the original DataFrame. Is that my problem? What's the most efficient way to edit those rows x columns (preferably in-pace, as the DataFrame may take up a lot of memory)?

此外,如果我想用形状正确的DataFrame替换这些值怎么办?

Also, what if I want to replace those values with a correctly shaped DataFrame?

推荐答案

使用 loc (=表示它是视图还是副本都无关紧要!):

Use loc in an assignment expression (the = means it's not relevant whether it is a view or a copy!):

In [11]: df.loc[rows, columns] = 99

In [12]: df
Out[12]:
    0   1   2   3   4   5   6   7   8   9
A   0  99   2   3  99   5   6  99   8   9
B  10  11  12  13  14  15  16  17  18  19
C  20  99  22  23  99  25  26  99  28  29

如果您使用的是0.11之前的版本,则可以使用.ix.

作为 @Jeff 的评论:

这是一个赋值表达式(请参见文档的使用ix进行高级索引"部分),并且不会返回任何内容(尽管有些赋值表达式返回值,例如.at.iat).

This is an assignment expression (see 'advanced indexing with ix' section of the docs) and doesn't return anything (although there are assignment expressions which do return things, e.g. .at and .iat).

df.loc[rows,columns] 可以返回视图,但通常是副本.令人困惑,但这样做是为了提高效率.

df.loc[rows,columns] can return a view, but usually it's a copy. Confusing, but done for efficiency.

底线:使用 ixlociloc 进行设置 (如上所述)不要修改副本.

Bottom line: use ix, loc, iloc to set (as above), and don't modify copies.

请参见部分.

这篇关于使用索引编辑pandas DataFrame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆