将默认值添加到没有值的项x组对(df%>%传播%>%聚集似乎很奇怪) [英] adding default values to item x group pairs that don't have a value (df %>% spread %>% gather seems strange)

查看:62
本文介绍了将默认值添加到没有值的项x组对(df%>%传播%>%聚集似乎很奇怪)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

操作方法

df1 %>% spread(groupid, value, fill = 0) %>% gather(groupid, value, one, two)

以更自然的方式?

给出数据框

df1 <- data.frame(groupid = c("one","one","one","two","two","two", "one"),
                  value = c(3,2,1,2,3,1,22),
                  itemid = c(1:6, 6))

对于许多itemid和groupid对,我们都有一个值,对于某些itemids
是没有值的groupid。我想为这些情况添加默认的
值。例如。对于itemid 1和groupid two,
没有值,我想在其中添加默认值的行。

for many itemid and groupid pairs we have a value, for some itemids there are groupids where there is no value. I want to add a default value for those cases. E.g. for the itemid 1 and groupid "two" there is no value, I want to add a row where this gets a default value.

以下tidyr代码实现了这一点,但感觉像是一种奇怪的
方式(此处添加的默认值为0)。

The following tidyr code achieves this, but it feels like a strange way to do it (the default value added here is 0).

df1 %>% spread(groupid, value, fill = 0) %>% gather(groupid, value, one, two)

我正在寻找有关如何以更自然的方式执行此操作的建议。

I am looking for suggestions on how to do this in a more natural way.

自几周以来,我看着上面的代码可能会对
的效果感到困惑,我写了一个包装它的函数:

Since in some weeks looking at the above code I would likely be confused about its effect I wrote a function wrapping it:

#' Add default values for missing groups
#' 
#' Given data about items where each item is identified by an id, and every
#' item can have a value in every group; add a default value for all groups
#' where an item doesn't have a value yet.
add_default_value <- function(data, id, group, value, default) {
  id = as.character(substitute(id))
  group = as.character(substitute(group))
  value = as.character(substitute(value))
  groups <- unique(as.character(data[[group]]))

  # spread checks that the columns outside of group and value uniquely
  # determine the row.  Here we check that that already is the case within
  # each group using only id.  I.e. there is no repeated (id, group).
  id_group_cts <- data %>% group_by_(id, group) %>% do(data.frame(.ct = nrow(.)))
  if (any(id_group_cts$.ct > 1)) {
    badline <- id_group_cts %>% filter(.ct > 1) %>% top_n(1, .ct)
    stop("There is at least one (", id, ", ", group, ")",
         " combination with two members: (",
         as.character(badline[[id]]), ", ", as.character(badline[[group]]), ")")
  }

  gather_(spread_(data, group, value, fill = default), group, value, groups)
}

最后一点:想要这样做的原因是,我的组被排序(第1周,第2周, ...)
,并且我希望每个id在每个组中都有一个值,以便在
对每个id的组进行排序后,我可以使用cumsum来获得
的每周运行总额

Last note: reason for wanting this is, my groups are ordered (week1, week2, ...) and I am looking to have every id have a value in every group so that after sorting the groups per id I can use cumsum to get a weekly running total that is also shown in the weeks where the running total didn't increase.

推荐答案

有一个新功能 <$ c的开发版本中完成 $ c> tidyr 可以做到这一点。

There is a new function complete in the development version of tidyr that does this.

df1 %>% complete(itemid, groupid, fill = list(value = 0))
##    itemid groupid value
## 1       1     one     3
## 2       1     two     0
## 3       2     one     2
## 4       2     two     0
## 5       3     one     1
## 6       3     two     0
## 7       4     one     0
## 8       4     two     2
## 9       5     one     0
## 10      5     two     3
## 11      6     one    22
## 12      6     two     1

这篇关于将默认值添加到没有值的项x组对(df%&gt;%传播%&gt;%聚集似乎很奇怪)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆