当与group_by()一起使用时,dplyr:lead()和lag()错误 [英] dplyr: lead() and lag() wrong when used with group_by()

查看:798
本文介绍了当与group_by()一起使用时,dplyr:lead()和lag()错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在每个组中找到lead()和lag()元素,但是有一些错误的结果。

I want to find the lead() and lag() element in each group, but had some wrong results.

例如,数据是这样的: / p>

For example, data is like this:

library(dplyr)
df = data.frame(name=rep(c('Al','Jen'),3),
                score=rep(c(100, 80, 60),2))
df

数据:

  name score
1   Al   100
2  Jen    80
3   Al    60
4  Jen   100
5   Al    80
6  Jen    60

现在我试着找出每个人的lead()和lag()分数。
如果我使用arrange()排序,我可以得到正确答案:

Now I try to find out lead() and lag() scores for each person. If I sort it using arrange(), I can get the correct answer:

df %>%
  arrange(name) %>%
  group_by(name) %>%
  mutate(next.score = lead(score),
         before.score = lag(score) )

OUTPUT1:

Source: local data frame [6 x 4]
Groups: name

      name score next.score before.score
    1   Al   100         60           NA
    2   Al    60         80          100
    3   Al    80         NA           60
    4  Jen    80        100           NA
    5  Jen   100         60           80
    6  Jen    60         NA          100

没有arrange(),结果是错误的:

Without arrange(), the result is wrong:

df %>%
  group_by(name) %>%
  mutate(next.score = lead(score),
         before.score = lag(score) )

OUTPUT2:

Source: local data frame [6 x 4]
Groups: name

  name score next.score before.score
1   Al   100         80           NA
2  Jen    80         60           NA
3   Al    60        100           80
4  Jen   100         80           60
5   Al    80         NA          100
6  Jen    60         NA           80

例如,第一行,Al的next.score应为60(第3行)。

E.g., in 1st line, Al's next.score should be 60 (3rd line).

任何人都知道为什么会发生?为什么安排()影响结果(值,而不仅仅是关于订单)?谢谢〜

Anybody know why this happened? Why arrange() affects the result (the values, not just about the order)? Thanks~

推荐答案

似乎你必须通过额外的参数来延迟和领先的功能。当我运行你的功能没有安排,但order_by添加,一切似乎都可以。

It seems you have to pass additional argument to lag and lead functions. When I run your function without arrange, but with order_by added, everything seems to be ok.

df %>%
group_by(name) %>%
mutate(next.score = lead(score, order_by=name),
before.score = lag(score, order_by=name))

输出:

  name score next.score before.score
1   Al   100         60           NA
2  Jen    80        100           NA
3   Al    60         80          100
4  Jen   100         60           80
5   Al    80         NA           60
6  Jen    60         NA          100

我的sessionInfo():

My sessionInfo():

R version 3.1.1 (2014-07-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=Polish_Poland.1250  LC_CTYPE=Polish_Poland.1250        LC_MONETARY=Polish_Poland.1250
[4] LC_NUMERIC=C                   LC_TIME=Polish_Poland.1250    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_0.4.1

loaded via a namespace (and not attached):
[1] assertthat_0.1  DBI_0.3.1       lazyeval_0.1.10 magrittr_1.5                parallel_3.1.1  Rcpp_0.11.5    
[7] tools_3.1.1 

这篇关于当与group_by()一起使用时,dplyr:lead()和lag()错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆