如何为维恩图准备 Pandas df [英] How to prepare Pandas df for Venn diagram

查看:53
本文介绍了如何为维恩图准备 Pandas df的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 Pandas 数据框,如下所示:

I have a Pandas dataframe as follows:

+-----+-----------+
| ID  | VALUE     |
+-----+-----------+
| A   | Today     |
+-----+-----------+
| A   | Yesterday |
+-----+-----------+
| B   | Tomorrow  |
+-----+-----------+
| C   | Tomorrow  |
+-----+-----------+
| D   | Today     |
+-----+-----------+
| D   | Tomorrow  |
+-----+-----------+
| E   | Today     |
+-----+-----------+
| E   | Yesterday |
+-----+-----------+
| E   | Tomorrow  |
+-----+-----------+

我想获得每个 ID重叠"的计数,因为我想根据这些数据构建维恩图.

I want to get counts of each ID's "overlap", as I want to construct a Venn diagram from this data.

例如在这种情况下,2 个 ID 在今天"和明天"中.2 个 ID 也在今天"和昨天"中.

E.g. in this case, 2 IDs are in 'Today' as well as in 'Tomorrow'. 2 IDs are also in both 'Today' and 'Yesterday'.

我该怎么做?我尝试了 value_countsgroup_by 的各种组合,但我没有运气想出一些智能的东西.

How do I go about doing this? I've tried various combinations of value_counts and group_by, but I've had no luck coming up with something intelligent.

推荐答案

您可以使用 crosstab 来得到假人,然后使用矩阵乘积来查看共现:

You can use crosstab to get the dummies, then matrix product to see cooccurrences:

s = pd.crosstab(df['ID'],df['VALUE'])

pair_intersection = s.T @ s
all_three = s.ne(0).all(1)

然后,pair_intersection 看起来像:

VALUE      Today  Tomorrow  Yesterday
VALUE                                
Today          3         2          2
Tomorrow       2         4          1
Yesterday      2         1          2

然后可以使用 pair_intersection.at['Today', 'Tomorrow'] 提取两个重叠组的计数.

Then counts of two overlapping groups can be extracted using pair_intersection.at['Today', 'Tomorrow'].

all_three

ID
A    False
B    False
C    False
D    False
E     True
dtype: bool

因此落在所有三个组中的实例数是sum(all_three)

And thus the number of instances that fall in all three groups is sum(all_three)

这篇关于如何为维恩图准备 Pandas df的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆