r元素频率和列名 [英] r element frequency and column name

查看:100
本文介绍了r元素频率和列名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含四列A,B,C和D的数据框:

I have a dataframe that has four columns A, B, C and D:

A    B    C    D
a    a    b    c
b    c    x    e
c    d    y    a
d              z
e
f

我想获取所有元素的出现频率以及它们出现的列列表,并按频率排名排序.输出将是这样的:

I would like to get the frequency of all elements and lists of columns they appear, ordered by the frequency ranking. The output would be something like this:

  Ranking  frequency column 
a    1         3      A, B, D
c    1         3      A, B, D
b    2         2      A, C
d    2         2      A, B
e    2         2      A, D
f  .....

我将不胜感激. 谢谢!

I would appreciate any help. Thank you!

推荐答案

类似这样的事情:

数据

df <- read.table(header=T, text='A    B    C    D
a    a    b    c
b    c    x    e
c    d    y    a
d   NA    NA     z
e  NA NA NA
f NA NA NA',stringsAsFactors=F)

解决方案

#find unique elements
elements <- unique(unlist(sapply(df, unique)))

#use a lapply to find the info you need
df2 <- data.frame(do.call(rbind,
        lapply(elements, function(x) {
          #find the rows and columns of the elements
          a <- which(df == x, arr.ind=TRUE)
          #find column names of the elements found
          b <- names(df[a[,2]])
          #find frequency
          c <- nrow(a)
          #produce output
          c(x, c, paste(b, collapse=','))
})))

#remove NAs
df2 <- na.omit(df2)
#change column names
colnames(df2) <- c('element','frequency', 'columns')
#order according to frequency
df2 <- df2[order(df2$frequency, decreasing=TRUE),]
#create the ranking column
df2$ranking <- as.numeric(factor(df2$frequency,levels=unique(df2$frequency)))

输出:

> df2
   element frequency columns ranking
1        a         3   A,B,D       1
3        c         3   A,B,D       1
2        b         2     A,C       2
4        d         2     A,B       2
5        e         2     A,D       2
6        f         1       A       3
8        x         1       C       3
9        y         1       C       3
10       z         1       D       3

如果您希望将element列作为row.names并将排名列作为第一列,您还可以执行以下操作:

And if you want the elements column to be as row.names and the ranking column to be first you can also do:

row.names(df2) <- df2$element
df2$element <- NULL
df2 <- df2[c('ranking','frequency','columns')]

输出:

 > df2
  ranking frequency columns
a       1         3   A,B,D
c       1         3   A,B,D
b       2         2     A,C
d       2         2     A,B
e       2         2     A,D
f       3         1       A
x       3         1       C
y       3         1       C
z       3         1       D

这篇关于r元素频率和列名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆