Python 中的成对频率计数表 [英] Table of Pairwise frequency counts in Python

查看:52
本文介绍了Python 中的成对频率计数表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对 python 完全陌生,我的大部分工作都是在 R 中完成的.我想知道如何在 python 中解决这个问题.请参阅链接以清楚了解问题和解决方案 R 代码.如何从长格式数据框中计算成对计数表

I'm completely new to python and most of my work has been done in R. I would like to know how to get this question work in python. Please refer to the link for clear understanding of the question and the solution R codes. How to calculate a table of pairwise counts from long-form data frame

这是数据集:

id  featureCode
5   PPLC
5   PCLI
6   PPLC
6   PCLI
7   PPL
7   PPLC
7   PCLI
8   PPLC
9   PPLC
10  PPLC

这就是我想要的:

     PPLC  PCLI  PPL
PPLC  0     3     1
PCLI  3     0     1
PPL   1     1     0

我想计算每个特征代码与其他特征代码一起使用的次数(标题的成对计数").我希望这现在有意义.请提供这方面的帮助.谢谢..

I'd like to calculate the number of times each feature code is used with the other feature codes (the "pairwise counts" of the title).I hope this makes sense now. Please provide help on this. Thanks..

推荐答案

这可以使用字典设置并使用集合和计数器来进行分析.但是,我将使用最简单的字典和循环方法进行分析.当然实际的代码可以做得更小,我特意展示了扩展的版本.我的 Python 没有可用的 Pandas,所以我使用的是最基本的 Python.

This can be set up using a dictionary set up and use collections and Counter to do the analysis. However, I will show an analysis using the simplest dictionary and loop methods. Of course the actual code can be made smaller, I am deliberately showing the expanded version. My Python does not have Pandas available, so I am using the most basic Python.

# Assume the you have a set of tuples lst
lst.sort() # sort the list by id
mydict = {}
id = None
tags = []
for ids in lst:
  if ids[0] == id
    # Pick up the current entry
    tags.append(ids[1])
  else:
    # This is a new id
    # check the count of the previous tags.
    for elem1 in tags:
      for elem2 in tags:
        if elem1 != elem2:
          if elem1 not in mydict:
            mydict[elem1] = {}
          if elem2 not in mydict[elem1]:
            mydict[elem1][elem2] = 0
          mydict[elem1][elem2] += 1
    # This is a different id, reset the indicators for the next loop
    id = ids[0]
    tags = ids[1]        # This is a new id
else:
  # The last element of the lst has to be processed as well
  # check the count of the previous tags.
  for elem1 in tags:
    for elem2 in tags:
      if elem1 != elem2:
        if elem1 not in mydict:
          mydict[elem1] = {}
        if elem2 not in mydict[elem1]:
          mydict[elem1][elem2] = 0
        mydict[elem1][elem2] += 1


# at this point, my dict has the full dictionary count
for tag in mydict.keys():
  print tag, mydict[tag]

这现在给出了带有计数的标签,您可以通过循环遍历最终字典来格式化输出,适当地打印键和计数.

This now gives the tags with the counts and you can format your output by looping over the final dictionary, printing the keys and counts appropriately.

这篇关于Python 中的成对频率计数表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆