来自 pandas 数据帧的成对矩阵 [英] Pairwise matrix from a pandas dataframe

查看：63 发布时间：2020/5/23 23:27:53 python pandas

本文介绍了来自 pandas 数据帧的成对矩阵的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个熊猫数据框，看起来像这样:

I have a pandas dataframe that looks something like this:

Al01 BBR60 CA07 NL219 AAEAMEVAT MP NaN MP MP AAFEDLRLL NaN NaN NaN NaN AAGAAVKGV NP NaN NP NP ADRGLLRDI NaN NP NaN NaN AEIMKICST PB1 NaN NaN PB1 AFDERRAGK NaN NaN NP NP AFDERRAGK NP NaN NaN NaN

大约有一千行和六列.大多数单元格为空(NaN).考虑到不同的列中包含文本，我想知道每列中文本的概率是多少.例如，这里的小片段将产生如下内容:

There are a thousand or so rows and half a dozen columns. Most cells are empty (NaN). I would like to know what the probability of text in each column is, given that a different column has text in it. For example, the little snippet here would produce something like this:

Al01 BBR60 CA07 NL219 Al01 4 0 2 3 BBR60 0 1 0 0 CA07 2 0 3 3 NL219 3 0 3 4

这表示Al01栏中有4个匹配项；在这4个匹配中，在BBR60列中没有匹配，在CA07列中也有2个匹配，在NL219列中没有3个匹配.依此类推.

That says that there are 4 hits in the Al01 column; of those 4 hits, none are hits in the BBR60 column, 2 are also hits in the CA07 column, and 3 are hits in the NL219 column. And so on.

我可以遍历每一列并使用值构建字典，但这似乎很笨拙.有没有更简单的方法?

I can step through each column and build a dict with the values, but that seems clumsy. Is there a simpler approach?

推荐答案

它只是矩阵乘法:

import pandas as pd
df = pd.read_csv('data.csv',index_col=0, delim_whitespace=True)
df2 = df.applymap(lambda x: int(not pd.isnull(x)))
print df2.T.dot(df2)

输出:

           Al01  BBR60  CA07  NL219
Al01      4      0     2      3
BBR60     0      1     0      0
CA07      2      0     3      3
NL219     3      0     3      4

[4 rows x 4 columns]

这篇关于来自 pandas 数据帧的成对矩阵的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

来自 pandas 数据帧的成对矩阵 [英] Pairwise matrix from a pandas dataframe

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

来自 pandas 数据帧的成对矩阵 [英] Pairwise matrix from a pandas dataframe

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭