R:当键位于不同列并返回值频率时传播键值对 [英] R: Spread key-value pairs when keys are in different columns and return value frequency

查看:199
本文介绍了R:当键位于不同列并返回值频率时传播键值对的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



我有一个数据框df:

  df = data.frame(id = c(10,11,12,13,14),
V1 = c ('blue','blue','blue',NA,NA),
V2 = c('blue','yellow',NA,'yellow','green'),
V3 = c('yellow',NA,NA,NA,'blue'))

使用V1-V3的值作为唯一列标题,并且我希望每行的每个行的出现频率填充行。



期望的输出:

  = data.frame(id = c(10,11,12,13,14),
blue = c(2,1,1,0,1),
yellow = c(1,1 ,0,1,0),
green = c(0,0,0,0,1))

tidyr :: spread和dplyr :: summarize可能是一个非常酷的方法。但是,当我想要传播的密钥遍布不同的列并且包括NAs时,我不知道如何传播V *列。



感谢任何帮助!

解决方案

使用融化 dcast 从包 reshape2

  dcast(melt(df,id =id,na.rm = TRUE),id〜value)

id blue green黄色
1 10 2 0 1
2 11 1 0 1
3 12 1 0 0
4 13 0 0 1
5 14 1 1 0

David Arenburg ,使用 recast 更简单,融合 dcast

  recast(df,id〜value,id.var =id )[,1:4]#na.rm不可能然后

id蓝绿色黄色
1 10 2 0 1
2 11 1 0 1
3 12 1 0 0
4 13 0 0 1
5 14 1 1 0


I have searched around but could not find a particular answer to my question.

Suppose I have a data frame df:

df = data.frame(id = c(10, 11, 12, 13, 14),
                V1 = c('blue', 'blue', 'blue', NA, NA),
                V2 = c('blue', 'yellow', NA, 'yellow', 'green'),
                V3 = c('yellow', NA, NA, NA, 'blue'))

I want to use the values of V1-V3 as unique column headers and I want the occurrence frequency of each of those per row to populate the rows.

Desired output:

desired = data.frame(id = c(10, 11, 12, 13, 14),
                     blue = c(2, 1, 1, 0, 1),
                     yellow = c(1, 1, 0, 1, 0),
                     green = c(0, 0, 0, 0, 1))

There is probably a really cool way to do this with tidyr::spread and dplyr::summarise. However, I don't know how to spread the V* columns when the keys I want to spread by are all over the place in different columns and include NAs.

Thanks for any help!

解决方案

Using meltand dcast from package reshape2:

dcast(melt(df, id="id", na.rm = TRUE), id~value)

  id blue green yellow
1 10    2     0      1
2 11    1     0      1
3 12    1     0      0
4 13    0     0      1
5 14    1     1      0

As suggested by David Arenburg, it is just simpler to use recast, a wrapper for melt and dcast:

recast(df, id ~ value, id.var = "id")[,1:4]   # na.rm is not possible then

  id blue green yellow
1 10    2     0      1
2 11    1     0      1
3 12    1     0      0
4 13    0     0      1
5 14    1     1      0

这篇关于R:当键位于不同列并返回值频率时传播键值对的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆