“添加缺失的分组变量"R中dplyr中的消息 [英] "Adding missing grouping variables" message in dplyr in R

查看:13
本文介绍了“添加缺失的分组变量"R中dplyr中的消息的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的脚本的一部分之前运行良好,但最近产生了一个奇怪的语句,之后我的许多其他功能无法正常工作.我试图在每个站点的值排名列表中选择第 8 和第 23 个位置,以找到每个站点 30 年中一年中每一天的第 25 个和第 75 个百分位值.我的方法如下(适用于四行数据集 - 通常,对于我的完整 30 年数据集,slice(3) 将是 slice(23)):

I have a portion of my script that was running fine before, but recently has been producing an odd statement after which many of my other functions do not work properly. I am trying to select the 8th and 23rd positions in a ranked list of values for each site to find the 25th and 75th percentile values for each day in a year for each site for 30 years. My approach was as follows (adapted for the four line dataset - slice(3) would be slice(23) for my full 30 year dataset usually):

library("dplyr")

mydata

structure(list(station_number = structure(c(1L, 1L, 1L, 1L), .Label = "01AD002", class = "factor"), 
year = 1981:1984, month = c(1L, 1L, 1L, 1L), day = c(1L, 
1L, 1L, 1L), value = c(113, 8.329999924, 15.60000038, 149
)), .Names = c("station_number", "year", "month", "day", "value"), class = "data.frame", row.names = c(NA, -4L))    

  value <- mydata$value
  qu25 <- mydata %>% 
          group_by(month, day, station_number) %>% 
          arrange(desc(value)) %>% 
          slice(3) %>% 
          select(value)

以前,我会留下一个表,每个站点都有一个值来描述第 25 个百分位数(因为排列函数似乎将它们从高到低排序).但是,现在当我运行这些行时,我收到一条消息:

Before, I would be left with a table that had one value per site to describe the 25th percentile (since the arrange function seems to order them highest to lowest). However, now when I run these lines, I get a message:

Adding missing grouping variables: `month`, `day`, `station_number`

这条消息对我来说没有意义,因为分组变量清楚地存在于我的表格中.此外,直到最近,这仍然运行良好.我试过了:

This message doesn’t make sense to me, as the grouping variables are clearly present in my table. Also, again, this was working fine until recently. I have tried:

  • detatch(plyr") – 因为我在 dplyr 之前加载了它
  • dplyr:: group_by – 将其直接放在 group_by 行中
  • 卸载并重新安装 dplyr,尽管这是我遇到的另一个问题

知道为什么我可能会收到这条消息以及它为什么停止工作吗?

Any idea why I might be receiving this message and why it may have stopped working?

感谢您的帮助.

更新:添加了一个站点的 dput 示例,但多年的 1 月 1 日值.希望一旦分组就返回位置值,例如 slice(3) 有望为这个较小的子集返回 15.6 值.

Update: Added dput example with one site, but values for January 1st for multiple years. The hope would be that the positional value is returned once grouped, for instance slice(3) would hopefully return the 15.6 value for this smaller subset.

推荐答案

为了一致性起见,分组变量在之前定义时应该始终存在,因此在执行 select(value) 时添加.ungroup 应该可以解决它:

For consistency sake the grouping variables should be always present when defined earlier and thus are added when select(value) is executed. ungroup should resolve it:

qu25 <- mydata %>% 
  group_by(month, day, station_number) %>%
  arrange(desc(value)) %>% 
  slice(2) %>% 
  ungroup() %>%
  select(value)

请求的结果没有警告:

> mydata %>% 
+   group_by(month, day, station_number) %>%
+   arrange(desc(value)) %>% 
+   slice(2) %>% 
+   ungroup() %>%
+   select(value)
# A tibble: 1 x 1
  value
  <dbl>
1   113

这篇关于“添加缺失的分组变量"R中dplyr中的消息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆