dplyr函数返回错误的NA [英] dplyr function returning false NAs

查看:57
本文介绍了dplyr函数返回错误的NA的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有与发布的此处相同的问题,但该问题仍未得到解答,并且我也有同样的问题。

I have the same question as posted here, but the question remains unanswered and I also am having the same problem.

我附上了我的数据样本此处

I have attached a sample of my data here.

我使用的R版本为3.4.2,dplyr版本为0.7.4。

The version of R that I am using is 3.4.2 and the version of dplyr is 0.7.4.

为了使每个人都快起来...
导入数据后,我进行了以下编辑:

To get everyone up to speed... After importing the data, I do these edits:

#specify which species are predators (pp = 1) and prey (pp = 0)
d1 = d1 %>%
    group_by(sps) %>% #grouped by species
    mutate(pp=ifelse(sps %in% c("MUXX", "MUVI","MEME"), 1,0)) #mutate to specify predators as 1 and prey as 0

d1$datetime=strftime(paste(d1$date,d1$time),'%Y-%m-%d %H:%M',usetz=FALSE) #converting the date/time into a new format 

head(d1) #visualize the first few lines of the data

d2 = d1 %>% filter(km %in% c("80")) #restricting the observations to just one location (km 80)

现在出现问题的地方(NA):

Now for where the problems arise (the NAs):

d2 = d2 %>% mutate(prev = dplyr::lag(pp)) 
#when I look at the output I see the lag function isn't working (shown below)

> d2
# A tibble: 209 x 10
# Groups:   sps [10]
  ID       date    km culv.id   type    sps   time    pp         datetime  prev
<int>     <fctr> <dbl>  <fctr> <fctr> <fctr> <fctr> <dbl>            <chr> <dbl>
1     1 2012-06-19    80       A    DCC  MICRO   2:19     0 2012-06-19 02:19    NA
2     2 2012-06-21    80       A    DCC   MUXX  23:23     1 2012-06-21 23:23    NA
3     3 2012-07-15    80       A    DCC   MAMO  11:38     0 2012-07-15 11:38    NA
4     4 2012-07-20    80       A    DCC  MICRO  22:19     0 2012-07-20 22:19     0
5     5 2012-07-29    80       A    DCC  MICRO  23:03     0 2012-07-29 23:03     0
6     8 2012-08-07    80       A    DCC   PRLO   2:04     0 2012-08-07 02:04    NA
7     9 2012-08-08    80       A    DCC  MICRO  23:56     0 2012-08-08 23:56     0
8    10 2012-08-09    80       A    DCC   PRLO  23:06     0 2012-08-09 23:06     0
9    11 2012-08-13    80       A    DCC  MICRO   0:04     0 2012-08-13 00:04     0
10   12 2012-08-13    80       A    DCC  MICRO   0:46     0 2012-08-13 00:46     0

对于滞后功能为何不起作用,有人可能会提出任何建议吗?

Might anyone have any suggests for why the lag function isn't working?

推荐答案

在先前的操作中,您指定了 group_by(sps),该组将一直连接到您的数据框,直到您 ungroup()。某些行级别的操作不会受到该组的影响,但是会聚合函数,并且基于多个行中的值求值的函数将受到影响。

In one of your previous operations, you specified group_by(sps), that group will stay attached to your data frame until you ungroup(). Some row level operations won't be affected by the group, but aggregate functions, and functions that evaluate based on values from more than one row will.

d2 <- d2 %>% ungroup() %>% mutate(prev = dplyr::lag(pp))

另外,关于我注意到的内容:

Also, as to what I noticed:


  1. 我在标题中看到了#组:sps [10]

  2. 每个 sps 值的第一个实例是NA ,但每个实例的第二个实例正确为0

  1. I saw in your header # Groups: sps [10]
  2. The first instance of each sps value is NA, but the second instance of each is correctly 0

但是,作为最终编辑,第一个值 lag ()始终为NA,因为没有先前的值。 group_by(sps)也是这样,但这意味着您将拥有10个NA值,每个因子级别的第一个实例均具有一个。如果要在组中使用滞后值,则不应 ungroup()并且该函数可以正常创建这些NA。您可以将这些NA替换为0,也可以使用其他值替换。

As a final edit however, the first value of lag() will always be NA, because there is no previous value. This is also true with group_by(sps), however it means you'll have 10 NA values, one for the first instance of each factor level. If you want the lagged value within the group, then you should not ungroup() and the function is working properly creating those NAs. You may replace those NAs with 0, or another value if appropriate.

这篇关于dplyr函数返回错误的NA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆