pandas 列作为numpy数组的索引 [英] Panda-Column as index for numpy array
问题描述
如何将熊猫行用作numpy数组的索引?说我有
How can I use a panda row as index for a numpy array? Say I have
>>> grid = arange(10,20)
>>> df = pd.DataFrame([0,1,1,5], columns=['i'])
我想做
>>> df['j'] = grid[df['i']]
IndexError: unsupported iterator index
实际执行此操作的简短方法是什么?
What is a short and clean way to actually perform this operation?
更新
精确地说,我想要一个附加列,该列的值与第一列包含的索引相对应:df['j'][0] = grid[df['i'][0]]
列中的0
等
To be precise, I want an additional column that has the values that correspond to the indices that the first column contains: df['j'][0] = grid[df['i'][0]]
in column 0
etc
预期输出:
index i j
0 0 10
1 1 11
2 1 11
3 5 15
并行案例:从小到大
仅在标准python/numpy
中显示想法的来源
Just to show where the idea comes from, in standard python / numpy
, if you have
>>> keys = [0, 1, 1, 5]
>>> grid = arange(10,20)
>>> grid[keys]
Out[30]: array([10, 11, 11, 15])
这正是我想要做的.只是我的密钥没有存储在向量中,而是存储在列中.
Which is exactly what I want to do. Only that my keys are not stored in a vector, they are stored in a column.
推荐答案
这是一个出现在熊猫0.13.0/numpy 1.8.0中的numpy错误.
This is a numpy bug that surfaced with pandas 0.13.0 / numpy 1.8.0.
您可以这样做:
In [5]: grid[df['i'].values]
Out[5]: array([0, 1, 1, 5])
In [6]: Series(grid)[df['i']]
Out[6]:
i
0 0
1 1
1 1
5 5
dtype: int64
这与您的输出匹配.您可以将数组分配给列,只要数组/列表的长度与框架相同(否则如何对齐)
This matches your output. You can assign an array to a column, as long as the length of the array/list is the same as the frame (otherwise how would you align it?)
In [14]: grid[keys]
Out[14]: array([10, 11, 11, 15])
In [15]: df['j'] = grid[df['i'].values]
In [17]: df
Out[17]:
i j
0 0 10
1 1 11
2 1 11
3 5 15
这篇关于 pandas 列作为numpy数组的索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!