“添加丢失的分组变量”; R中的dplyr中的消息 [英] "Adding missing grouping variables" message in dplyr in R

查看:226
本文介绍了“添加丢失的分组变量”; R中的dplyr中的消息的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的脚本的一部分以前运行良好,但是最近一直在生成一个奇怪的语句,此后我的许多其他功能无法正常运行。我试图在每个站点的值的排名列表中选择第8位和第23位,以查找30年中每个站点一年中每天的25%和75%百分位值。我的方法如下(适用于四行数据集-对于我整个30年的数据集,slice(3)通常为slice(23)):

I have a portion of my script that was running fine before, but recently has been producing an odd statement after which many of my other functions do not work properly. I am trying to select the 8th and 23rd positions in a ranked list of values for each site to find the 25th and 75th percentile values for each day in a year for each site for 30 years. My approach was as follows (adapted for the four line dataset - slice(3) would be slice(23) for my full 30 year dataset usually):

library("dplyr")

mydata

structure(list(station_number = structure(c(1L, 1L, 1L, 1L), .Label = "01AD002", class = "factor"), 
year = 1981:1984, month = c(1L, 1L, 1L, 1L), day = c(1L, 
1L, 1L, 1L), value = c(113, 8.329999924, 15.60000038, 149
)), .Names = c("station_number", "year", "month", "day", "value"), class = "data.frame", row.names = c(NA, -4L))    

  value <- mydata$value
  qu25 <- mydata %>% 
          group_by(month, day, station_number) %>% 
          arrange(desc(value)) %>% 
          slice(3) %>% 
          select(value)

在此之前,我将得到一个表,该表的每个站点都有一个值来描述第25个百分位数(因为ranging函数似乎从高到低排序)。但是,现在,当我运行这些行时,我得到一条消息:

Before, I would be left with a table that had one value per site to describe the 25th percentile (since the arrange function seems to order them highest to lowest). However, now when I run these lines, I get a message:

Adding missing grouping variables: `month`, `day`, `station_number`

此消息对我来说没有意义,因为显然存在分组变量在我的桌子上。同样,直到最近,它仍然运行良好。我尝试过:

This message doesn’t make sense to me, as the grouping variables are clearly present in my table. Also, again, this was working fine until recently. I have tried:


  • detatch( plyr)–因为我在dplyr之前加载了它

  • dplyr :: group_by –将其直接放在group_by行中

  • 卸载并重新安装dplyr,尽管这是我遇到的另一个问题

有人知道为什么我可能会收到此消息以及为什么它可能停止工作了吗?

Any idea why I might be receiving this message and why it may have stopped working?

感谢您的帮助。

更新:添加了带有一个站点的dput示例,但值的年份为1月1日。希望一旦分组后就返回位置值,例如slice(3)希望为这个较小的子集返回15.6值。

Update: Added dput example with one site, but values for January 1st for multiple years. The hope would be that the positional value is returned once grouped, for instance slice(3) would hopefully return the 15.6 value for this smaller subset.

推荐答案

为了保持一致性,分组变量应在早先定义时始终存在,因此在 select时添加(值)被执行。 ungroup 应该解决它:

For consistency sake the grouping variables should be always present when defined earlier and thus are added when select(value) is executed. ungroup should resolve it:

qu25 <- mydata %>% 
  group_by(month, day, station_number) %>%
  arrange(desc(value)) %>% 
  slice(2) %>% 
  ungroup() %>%
  select(value)

请求的结果没有警告:

> mydata %>% 
+   group_by(month, day, station_number) %>%
+   arrange(desc(value)) %>% 
+   slice(2) %>% 
+   ungroup() %>%
+   select(value)
# A tibble: 1 x 1
  value
  <dbl>
1   113

这篇关于“添加丢失的分组变量”; R中的dplyr中的消息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆