pandas 列作为numpy数组的索引 [英] Panda-Column as index for numpy array

查看:85
本文介绍了 pandas 列作为numpy数组的索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何将熊猫行用作numpy数组的索引?说我有

How can I use a panda row as index for a numpy array? Say I have

>>> grid = arange(10,20)
>>> df = pd.DataFrame([0,1,1,5], columns=['i'])

我想做

>>> df['j'] = grid[df['i']]
IndexError: unsupported iterator index

实际执行此操作的简短方法是什么?

What is a short and clean way to actually perform this operation?

更新

精确地说,我想要一个附加列,该列的值与第一列包含的索引相对应:df['j'][0] = grid[df['i'][0]]列中的0

To be precise, I want an additional column that has the values that correspond to the indices that the first column contains: df['j'][0] = grid[df['i'][0]] in column 0 etc

预期输出:

index i j 
    0 0 10
    1 1 11
    2 1 11
    3 5 15 

并行案例:从小到大

仅在标准python/numpy中显示想法的来源

Just to show where the idea comes from, in standard python / numpy, if you have

>>> keys = [0, 1, 1, 5]
>>> grid = arange(10,20)
>>> grid[keys]
Out[30]: array([10, 11, 11, 15])

这正是我想要做的.只是我的密钥没有存储在向量中,而是存储在列中.

Which is exactly what I want to do. Only that my keys are not stored in a vector, they are stored in a column.

推荐答案

这是一个出现在熊猫0.13.0/numpy 1.8.0中的numpy错误.

This is a numpy bug that surfaced with pandas 0.13.0 / numpy 1.8.0.

您可以这样做:

In [5]: grid[df['i'].values]
Out[5]: array([0, 1, 1, 5])

In [6]: Series(grid)[df['i']]
Out[6]: 
i
0    0
1    1
1    1
5    5
dtype: int64

这与您的输出匹配.您可以将数组分配给列,只要数组/列表的长度与框架相同(否则如何对齐)

This matches your output. You can assign an array to a column, as long as the length of the array/list is the same as the frame (otherwise how would you align it?)

In [14]: grid[keys]
Out[14]: array([10, 11, 11, 15])

In [15]: df['j'] = grid[df['i'].values]


In [17]: df
Out[17]: 
   i   j
0  0  10
1  1  11
2  1  11
3  5  15

这篇关于 pandas 列作为numpy数组的索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆