R:所有列组合的频率 [英] R: Frequency of all column combinations

查看:72
本文介绍了R:所有列组合的频率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题描述

我有一个大小相等的字符串列表,如下所示:

I have a list of strings of equal size like this:

example.list <- c('BBCD','ABBC','ADDB','ACBB')

然后,我想获得在特定位置出现特定字母的频率.首先,我将其转换为矩阵:

Then I want to obtain the frequency of occurence of specific letters at specific positions. First I convert this to a matrix:

     A1 B1 C1 D1 A2 B2 C2 D2 A3 B3 C3 D3 A4 B4 C4 D4
[1,]  0  1  0  0  0  1  0  0  0  0  1  0  0  0  0  1
[2,]  1  0  0  0  0  1  0  0  0  1  0  0  0  0  1  0
[3,]  1  0  0  0  0  0  0  1  0  0  0  1  0  1  0  0
[4,]  1  0  0  0  0  0  1  0  0  1  0  0  0  1  0  0
[5,]  1  0  0  0  0  1  0  0  0  1  0  0  0  0  0  1

现在,我想获取每个列组合的频率.一些例子:

Now I want to obtain the frequency of each column combination. Some examples:

A1 : B2 = 2
A1 : B3 = 3
B1 : B2 = 1
.. etc

推荐答案

将字符串分成单个字符向量的列表 s .将 n 设置为它们的公共长度,并从 s 创建一个矩阵 v ,其列中包含诸如 B1 等元素.然后使用 xtabs 创建计数,给出 m1 combn 以获得 m2 中的成对计数.

Split the strings into a list, s, of vectors of single characters. Set n to their common length and create a matrix v from s whose columns contain elements such as B1, etc. Then use xtabs to create counts giving m1 and combn to get counts of pairs in m2.

s <- strsplit(example.list, "")
n <- lengths(s)[1]
v <- sapply(s, paste0, 1:n)
m1 <- xtabs(~., data.frame(colv = c(col(v)), v = c(v)))
m2 <- combn(1:ncol(m1), 2, function(ix) sum(m1[, ix[1]] * m1[, ix[2]]))
names(m2) <- combn(colnames(m1), 2, paste, collapse = "")

给予:

> m1
    v
colv A1 B1 B2 B3 B4 C2 C3 C4 D2 D3 D4
   1  0  1  1  0  0  0  1  0  0  0  1
   2  1  0  1  1  0  0  0  1  0  0  0
   3  1  0  0  0  1  0  0  0  1  1  0
   4  1  0  0  1  1  1  0  0  0  0  0

> m2
A1B1 A1B2 A1B3 A1B4 A1C2 A1C3 A1C4 A1D2 A1D3 A1D4 B1B2 B1B3 B1B4 B1C2 B1C3 B1C4 
   0    1    2    2    1    0    1    1    1    0    1    0    0    0    1    0 
B1D2 B1D3 B1D4 B2B3 B2B4 B2C2 B2C3 B2C4 B2D2 B2D3 B2D4 B3B4 B3C2 B3C3 B3C4 B3D2 
   0    0    1    1    0    0    1    1    0    0    1    1    1    0    1    0 
B3D3 B3D4 B4C2 B4C3 B4C4 B4D2 B4D3 B4D4 C2C3 C2C4 C2D2 C2D3 C2D4 C3C4 C3D2 C3D3 
   0    0    1    0    0    1    1    0    0    0    0    0    0    0    0    0 
C3D4 C4D2 C4D3 C4D4 D2D3 D2D4 D3D4 
   1    0    0    0    1    0    0 

这篇关于R:所有列组合的频率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆