来自 pandas 数据帧的共现矩阵 [英] Cooccurence matrix from pandas dataframe

查看：100 发布时间：2020/10/17 1:31:05 python python-3.x pandas dataframe

本文介绍了来自 pandas 数据帧的共现矩阵的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个熊猫数据框，我需要计算数据框中每个唯一条目出现在彼此的同一行中的行数。

I have a pandas dataframe, and I need count how many rows are there where each unique entry in the dataframe occurs within the same row of each other entry.

Python单词列表中的共现矩阵：
与我的问题类似，但并非以数据框开头。大多数答案使用迭代。我希望Pandas中存在更好的解决方案。

在python pandas中构建共现矩阵：
这已经从一个数据帧开始，该数据帧中的主体只有0和1（我想代表实际值？），但是没有实际值。

将两列数据帧转换为熊猫中的出现矩阵：
这篇文章假设仅存在两列，这对此处讨论的情况有很大的限制

Co-occurrence Matrix from list of words in Python: Similar question to mine, but does not start with a dataframe. Most answers use iterations. I hope a better solution exists in Pandas.
Constructing a co-occurrence matrix in python pandas: This already starts with a dataframe where there are only 0 and 1 in the body (I guess representing the actual values?) but not the actual values.
Convert Two column data frame to occurrence matrix in pandas: This post assumes there are two columns only, which is rather restrictive for the case discussed here

import pandas as pd
import numpy as np

数据帧：

df = pd.DataFrame({'a': ['A', 'A', 'B', 'B'],
                   'b': ['B', 'C', 'B', 'B'],
                   'c': ['C', 'A', 'C', 'A'],
                   'd': ['B', 'D', 'B', 'A']},
                   index=[0, 1, 2, 3])

ie：

+----+-----+-----+-----+-----+
|    | a   | b   | c   | d   |
|----+-----+-----+-----+-----|
|  0 | A   | B   | C   | B   |
|  1 | A   | C   | A   | D   |
|  2 | B   | B   | C   | B   |
|  3 | B   | B   | A   | A   |
+----+-----+-----+-----+-----+

_{（使用此打印。）}

_{(Printed using this.)}

我试图使用来自答案的代码，&替换以下变量：

I have tried to use the code from answer, & substituting these variables:

document = [list(each) for each in df.values]
names = list(np.unique(df.values))

它给出了错误的结果：

它基于迭代，所以我希望有一个更好的解决方案。

It is based on iteratations, so I would hope for a better solution.

+----+-----+-----+-----+-----+
|    |   A |   B |   C |   D |
|----+-----+-----+-----+-----|
| A  | nan |   2 |   2 |   1 |
| B  |   2 | nan |   2 |   0 |
| C  |   2 |   2 | nan |   1 |
| D  |   1 |   0 |   1 | nan |
+----+-----+-----+-----+-----+

有 2 行，其中 A & B 都出现，因此单元格行 A 列B 为 2 。有 2 行，其中 A & C 都出现，因此单元格行 A 列 C 是 2 。


There are 2 rows where A & B both appears, so the value in the cell row A column B is 2.
There are 2 rows where A & C both appears, so the value in the cell row A column C is 2.
 如何在Pandas中轻松获取此按行共现矩阵？如果我不必遍历所有值，那就太好了。
How can I get this row-wise cooccurence matrix easily in Pandas? It would be great if I didn't have to loop through the values.
 _{（熊猫。分类可能有用，我还没有设法使它生效。）} 
_{(pandas.Categorical might be some use, I haven't managed to make it work yet.)}
推荐答案
我们可以先进行堆，然后进行 get_dummies 和 dot 然后取值
WE can do stack then get_dummies and dot then value
s=df.stack().str.get_dummies().sum(level=0).ne(0).astype(int)
s=s.T.dot(s).astype(float)
np.fill_diagonal(s.values, np.nan)
s
Out[33]: 
     A    B    C    D
A  NaN  2.0  2.0  1.0
B  2.0  NaN  2.0  0.0
C  2.0  2.0  NaN  1.0
D  1.0  0.0  1.0  NaN


                        这篇关于来自 pandas 数据帧的共现矩阵的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

来自 pandas 数据帧的共现矩阵 [英] Cooccurence matrix from pandas dataframe

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

来自 pandas 数据帧的共现矩阵 [英] Cooccurence matrix from pandas dataframe

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭