计算唯一对并将计数存储在矩阵中 [英] Count unique pairs and store counts in a matrix
问题描述
我的问题类似于stackoverflow.com/q/7549410
我有如下所示的配对数据:
I have paired data which look like this:
ID ATTR
3 10
1 20
1 20
4 30
我想计算唯一对并将这些频率计数存储在这样的矩阵中:
I want to count the unique pairs and store those frequency counts in a matrix like this:
10 20 30
1 | 0 2 0
3 | 1 0 0
4 | 0 0 1
或者,如果已知 ID 在 {1, 2, 3, 4} 中取值,而 ATTR 在 {0, 10, 20, 30} 中取值,那么我想要这样的矩阵:
Alternatively, if it's known that ID takes values in {1, 2, 3, 4} while ATTR in {0, 10, 20, 30} then I want a matrix as such:
0 10 20 30
1 | 0 0 2 0
2 | 0 0 0 0
3 | 0 1 0 0
4 | 0 0 0 1
问题:在 Python 或 NumPy 中同时执行这两种操作的最快方法是什么?
Question: What's the fastest way to do both of them in Python or NumPy?
我曾尝试使用 Pandas,但得到一个空的 DataFrame:
I have tried using Pandas but I get an empty DataFrame:
import numpy as np
import pandas as pd
x = pd.DataFrame([[3, 10], [1, 20], [1, 20], [4, 30]])
x.pivot_table(index = 0, columns = 1, fill_value = 0, aggfunc = 'sum')
推荐答案
您似乎想要执行交叉制表,然后进行重新索引操作.对于交叉表,有很多方法可以给猫剥皮.
It looks like you want to perform a cross tabulation, followed by a reindexing operation. For the cross tabulation, there are many ways to skin a cat.
首先,使用pivot_table
-
v = x.pivot_table(
index=0,
columns=1,
values=1,
aggfunc='size',
fill_value=0
)
或者,pd.crosstab
-
v = pd.crosstab(x[0], x[1])
或者,set_index
+ get_dummies
+ sum(level=0)
v = pd.get_dummies(x.set_index(0)[1]).sum(level=0)
或者,get_dummies
+ dot
-
v = pd.get_dummies(x[0]).T.dot(pd.get_dummies(x[1]))
v
10 20 30
1 0 2 0
3 1 0 0
4 0 0 1
接下来,在 v
上调用 reindex
-
Next, call reindex
on v
-
v.reindex(index=range(1, 5), columns=range(0, 40, 10), fill_value=0)
1 0 10 20 30
0
1 0 0 2 0
2 0 0 0 0
3 0 1 0 0
4 0 0 0 1
这篇关于计算唯一对并将计数存储在矩阵中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!