R:所有列组合的频率 [英] R: Frequency of all column combinations
问题描述
问题描述
我有一个大小相等的字符串列表,如下所示:
I have a list of strings of equal size like this:
example.list <- c('BBCD','ABBC','ADDB','ACBB')
然后,我想获得在特定位置出现特定字母的频率.首先,我将其转换为矩阵:
Then I want to obtain the frequency of occurence of specific letters at specific positions. First I convert this to a matrix:
A1 B1 C1 D1 A2 B2 C2 D2 A3 B3 C3 D3 A4 B4 C4 D4
[1,] 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 1
[2,] 1 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0
[3,] 1 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0
[4,] 1 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0
[5,] 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1
现在,我想获取每个列组合的频率.一些例子:
Now I want to obtain the frequency of each column combination. Some examples:
A1 : B2 = 2
A1 : B3 = 3
B1 : B2 = 1
.. etc
推荐答案
将字符串分成单个字符向量的列表 s
.将 n
设置为它们的公共长度,并从 s
创建一个矩阵 v
,其列中包含诸如 B1
等元素.然后使用 xtabs
创建计数,给出 m1
和 combn
以获得 m2
中的成对计数.
Split the strings into a list, s
, of vectors of single characters. Set n
to their common length and create a matrix v
from s
whose columns contain elements such as B1
, etc. Then use xtabs
to create counts giving m1
and combn
to get counts of pairs in m2
.
s <- strsplit(example.list, "")
n <- lengths(s)[1]
v <- sapply(s, paste0, 1:n)
m1 <- xtabs(~., data.frame(colv = c(col(v)), v = c(v)))
m2 <- combn(1:ncol(m1), 2, function(ix) sum(m1[, ix[1]] * m1[, ix[2]]))
names(m2) <- combn(colnames(m1), 2, paste, collapse = "")
给予:
> m1
v
colv A1 B1 B2 B3 B4 C2 C3 C4 D2 D3 D4
1 0 1 1 0 0 0 1 0 0 0 1
2 1 0 1 1 0 0 0 1 0 0 0
3 1 0 0 0 1 0 0 0 1 1 0
4 1 0 0 1 1 1 0 0 0 0 0
> m2
A1B1 A1B2 A1B3 A1B4 A1C2 A1C3 A1C4 A1D2 A1D3 A1D4 B1B2 B1B3 B1B4 B1C2 B1C3 B1C4
0 1 2 2 1 0 1 1 1 0 1 0 0 0 1 0
B1D2 B1D3 B1D4 B2B3 B2B4 B2C2 B2C3 B2C4 B2D2 B2D3 B2D4 B3B4 B3C2 B3C3 B3C4 B3D2
0 0 1 1 0 0 1 1 0 0 1 1 1 0 1 0
B3D3 B3D4 B4C2 B4C3 B4C4 B4D2 B4D3 B4D4 C2C3 C2C4 C2D2 C2D3 C2D4 C3C4 C3D2 C3D3
0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0
C3D4 C4D2 C4D3 C4D4 D2D3 D2D4 D3D4
1 0 0 0 1 0 0
这篇关于R:所有列组合的频率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!