如何为维恩图准备 Pandas df [英] How to prepare Pandas df for Venn diagram
问题描述
我有一个 Pandas 数据框,如下所示:
I have a Pandas dataframe as follows:
+-----+-----------+
| ID | VALUE |
+-----+-----------+
| A | Today |
+-----+-----------+
| A | Yesterday |
+-----+-----------+
| B | Tomorrow |
+-----+-----------+
| C | Tomorrow |
+-----+-----------+
| D | Today |
+-----+-----------+
| D | Tomorrow |
+-----+-----------+
| E | Today |
+-----+-----------+
| E | Yesterday |
+-----+-----------+
| E | Tomorrow |
+-----+-----------+
我想获得每个 ID重叠"的计数,因为我想根据这些数据构建维恩图.
I want to get counts of each ID's "overlap", as I want to construct a Venn diagram from this data.
例如在这种情况下,2 个 ID 在今天"和明天"中.2 个 ID 也在今天"和昨天"中.
E.g. in this case, 2 IDs are in 'Today' as well as in 'Tomorrow'. 2 IDs are also in both 'Today' and 'Yesterday'.
我该怎么做?我尝试了 value_counts
和 group_by
的各种组合,但我没有运气想出一些智能的东西.
How do I go about doing this? I've tried various combinations of value_counts
and group_by
, but I've had no luck coming up with something intelligent.
推荐答案
您可以使用 crosstab
来得到假人,然后使用矩阵乘积来查看共现:
You can use crosstab
to get the dummies, then matrix product to see cooccurrences:
s = pd.crosstab(df['ID'],df['VALUE'])
pair_intersection = s.T @ s
all_three = s.ne(0).all(1)
然后,pair_intersection
看起来像:
VALUE Today Tomorrow Yesterday
VALUE
Today 3 2 2
Tomorrow 2 4 1
Yesterday 2 1 2
然后可以使用 pair_intersection.at['Today', 'Tomorrow']
提取两个重叠组的计数.
Then counts of two overlapping groups can be extracted using pair_intersection.at['Today', 'Tomorrow']
.
all_three
是
ID
A False
B False
C False
D False
E True
dtype: bool
因此落在所有三个组中的实例数是sum(all_three)
And thus the number of instances that fall in all three groups is sum(all_three)
这篇关于如何为维恩图准备 Pandas df的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!