使用摘要(dplyr)的结果来变异原始数据帧 [英] Using the result of summarise (dplyr) to mutate the original dataframe

查看：41 发布时间：2020/5/4 7:17:12 r dplyr posixct lubridate

本文介绍了使用摘要(dplyr)的结果来变异原始数据帧的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个相当大的数据框，其中有一列POSIXct日期时间(每小时数据约10年).我将标记出夏令时中该日所在的所有行.例如，如果夏令时开始于"2000-04-02 03:00:00"(DOY = 93)，我希望可以标记前两个小时的DOY = 93. 尽管我是dplyr的新手，但我会尽可能使用此软件包，并尽可能避免使用 for循环

I have a rather big dataframe with a column of POSIXct datetimes (~10yr of hourly data). I would flag all the rows in which the day falls in a Daylight saving period. For example if the Daylight shift starts on '2000-04-02 03:00:00' (DOY=93) i would like that the two previous hours of DOY=93 could be flagged. Although I am a newbie of dplyr I would use this package as much as possible and avoid for-loops as much as possible

例如:

library(lubridate)
sd = ymd('2000-01-01',tz="America/Denver")
ed = ymd('2005-12-31',tz="America/Denver")
span = data.frame(date=seq(from=sd,to=ed, by="hour"))
span$YEAR = year(span$date)
span$DOY = yday(span$date)
span$DLS = dst(span$date)

要查找应用夏时制的一年中的不同日期，请使用dplyr

To find the different days of the year in which the daylight saving is applied I use dplyr

library(dplyr)
limits = span %.% group_by(YEAR) %.% summarise(minDOY=min(DOY[DLS]),maxDOY=max(DOY[DLS]))

那给

      YEAR minDOY maxDOY
    1 2000     93    303
    2 2001     91    301
    3 2002     97    300
    4 2003     96    299
    5 2004     95    305
    6 2005     93    303

现在，我可以将上述结果传递"到 span 数据框中，而无需使用效率低下的 for-loop .

Now I would 'pipe' the above results in the span dataframe without using a inefficient for-loop.

在@aosmith的帮助下，仅需两个命令即可解决该问题(并避免像解决方案2"中那样使用inner_join):

with the help of @aosmith the problem can be tackled with just two commands (and avoiding the inner_join as in 'solution 2'):

 limits = span %>% group_by(YEAR) %>% mutate(minDOY=min(DOY[DLS]),maxDOY=max(DOY[DLS]),CHECK=FALSE)

 limits$CHECK[(limits2$DOY >= limits$minDOY) & (limits$DOY <= limits$maxDOY) ] = TRUE

解决方案2

借助@beetroot和@ matthew-plourde，该问题已解决: 之间的内部连接不见了:

SOLUTION 2

With the help of @beetroot and @matthew-plourde, the problem has been solved: an inner-join between was missing:

limits = span %>% group_by(YEAR) %>% summarise(minDOY=min(DOY[DLS]),maxDOY=max(DOY[DLS])) %>% inner_join(span, by='YEAR')

然后我刚刚添加了一个新列(CHECK)，以填写夏令时的正确值

Then I just added a new column (CHECK) to fill with the right values for the Daylight-savings days

limits$CHECK = FALSE
limits$CHECK[(limits$DOY >= limits$minDOY) & (limits$DOY <= limits$maxDOY) ] = TRUE

推荐答案

正如@beetroot在注释中指出的那样，您可以通过联接来实现:

As @beetroot points out in the comments, you can accomplish this with a join:

limits = span %>% 
   group_by(YEAR) %>% 
   summarise(minDOY=min(DOY[DLS]),maxDOY=max(DOY[DLS])) %>%
   inner_join(span, by='YEAR')
#    YEAR minDOY maxDOY                date DOY   DLS
# 1  2000     93    303 2000-01-01 00:00:00   1 FALSE
# 2  2000     93    303 2000-01-01 01:00:00   1 FALSE
# 3  2000     93    303 2000-01-01 02:00:00   1 FALSE
# 4  2000     93    303 2000-01-01 03:00:00   1 FALSE
# 5  2000     93    303 2000-01-01 04:00:00   1 FALSE
# 6  2000     93    303 2000-01-01 05:00:00   1 FALSE
# 7  2000     93    303 2000-01-01 06:00:00   1 FALSE
# 8  2000     93    303 2000-01-01 07:00:00   1 FALSE
# 9  2000     93    303 2000-01-01 08:00:00   1 FALSE
# 10 2000     93    303 2000-01-01 09:00:00   1 FALSE

这篇关于使用摘要(dplyr)的结果来变异原始数据帧的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用摘要(dplyr)的结果来变异原始数据帧 [英] Using the result of summarise (dplyr) to mutate the original dataframe

问题描述

解决方案2

SOLUTION 2

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用摘要(dplyr)的结果来变异原始数据帧 [英] Using the result of summarise (dplyr) to mutate the original dataframe

问题描述

解决方案2

SOLUTION 2

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭