python-创建数据透视表 [英] python - create a pivot table
问题描述
我正在尝试从python中的Numpy数组创建数据透视表.我做了很多研究,但找不到直接的解决方案.我知道您可以使用Pandas来做到这一点,但是我在安装它时遇到了麻烦-但是必须有一种不用Pandas来做到这一点的方法.我的Numpy数组是
I'm trying to create a pivot table from a Numpy array in python. I've done a lot of research but I cannot find a straight forward solution. I know you can do it with Pandas but I'm having trouble installing it - but there must be a way of doing it without Pandas. My Numpy array is
[[ 4057 8 1374]
[ 4057 9 759]
[ 4057 11 96]
...,
[89205 16 146]
[89205 17 154]
[89205 18 244]]
我需要一个数据透视表,其中行是第一列,列是第二列,值是第三列.请帮忙!
I need a pivot table where the rows are the first column, the columns are the second column and the values are the third column. Help please!
谢谢
推荐答案
我认为这是您想要的:
data = np.array([[ 4057, 8, 1374],
[ 4057, 9, 759],
[ 4057, 11, 96],
[89205, 16, 146],
[89205, 17, 154],
[89205, 18, 244]])
rows, row_pos = np.unique(data[:, 0], return_inverse=True)
cols, col_pos = np.unique(data[:, 1], return_inverse=True)
pivot_table = np.zeros((len(rows), len(cols)), dtype=data.dtype)
pivot_table[row_pos, col_pos] = data[:, 2]
>>> pivot_table
array([[1374, 759, 96, 0, 0, 0],
[ 0, 0, 0, 146, 154, 244]])
>>> rows
array([ 4057, 89205])
>>> cols
array([ 8, 9, 11, 16, 17, 18])
这种方法有一些局限性,主要是,如果您对相同的行/列组合重复输入,则不会将它们加在一起,而只会保留一个(可能是最后一个).如果您想将它们全部加在一起,尽管有些麻烦,但是您可能会滥用scipy的稀疏模块:
There are some limitations to this approach, the main being that, if you have repeated entries for a same row/column combination, they will not be added together, but only one (possibly the last) will be kept. If you want to add them all together, although a little convoluted, you could abuse scipy's sparse module:
data = np.array([[ 4057, 8, 1374],
[ 4057, 9, 759],
[ 4057, 11, 96],
[89205, 16, 146],
[89205, 17, 154],
[89205, 18, 244],
[ 4057, 11, 4]])
rows, row_pos = np.unique(data[:, 0], return_inverse=True)
cols, col_pos = np.unique(data[:, 1], return_inverse=True)
pivot_table = np.zeros((len(rows), len(cols)), dtype=data.dtype)
pivot_table[row_pos, col_pos] = data[:, 2]
>>> pivot_table # the element at [0, 2] should be 100!!!
array([[1374, 759, 4, 0, 0, 0],
[ 0, 0, 0, 146, 154, 244]])
import scipy.sparse as sps
pivot_table = sps.coo_matrix((data[:, 2], (row_pos, col_pos)),
shape=(len(rows), len(cols))).A
>>> pivot_table # now repeated elements are added together
array([[1374, 759, 100, 0, 0, 0],
[ 0, 0, 0, 146, 154, 244]])
这篇关于python-创建数据透视表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!