在R中,是否可以在多个组中包括相同的行,还是有其他解决方法? [英] In R, is it possible to include the same row in multiple groups, or is there other workaround?

查看:133
本文介绍了在R中,是否可以在多个组中包括相同的行,还是有其他解决方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我测量了当天多个时间点土壤中的N20通量(不等间距)。我试图通过找到给定日子曲线下的面积来计算土壤中N2O的总通量。当我只使用某一天的措施时,我知道如何做到这一点,然而,我想包括前一天的最后一个测量值和第二天的第一个测量值来改善曲线的估计。



以下是一个更具体想法的例子:

  library(MESS)
库(lubridate)
库(dplyr)

生成可复制的示例

  datetime<  -  seq(ymd_hm('2015-04-07 11:20'),ymd('2015-04-13' ),by ='hours')
dat < - data.frame(datetime,day = day(datetime),Flux = rnorm(n = length(datetime),mean = 400,sd = 20))

useDate< - data.frame(day = c(7:12),DateGood = c(否,是,是,否,是 ))
dat < - left_join(dat,useDate)

有些日子是 (太多缺失的措施),有的是好(可用)。目标是过滤在Good日发生的所有测量(行)以及前一天的最后一次测量以及次日的第一次测量。

  out<  -  dat%>%
mutate(lagDateGood = lag(DateGood),
leadDateGood = lead(DateGood))%>%
过滤器(lagDateGood!=否| leadDateGood!=否)

现在我需要计算曲线下面积 - 这是不正确的

  out2<  -  out%>%
group_by (天)%>%
mutate(hourOfday = hour(datetime)+ minute(datetime)/ 60)%>%
总结(auc = auc(x = hourOfday,y = Flux,from = 0,to = 24,type =spline))

麻烦的是,在计算AUC时,包括前一天结束和下一天开始的测量。此外,我估计第10天的通量,这是一个糟糕的一天。



我认为我的问题的关键是与团体有关。一些测量需要在多个组中(例如,第8天的最后一次测量将用于估计第8天和第9天的AUC)。你有什么建议可以组建新团体吗?或者可能会有完全不同的方式来实现目标?

解决方案

这是另一种方式。更符合@Alex Brown的建议。

 #另一种方式
last< - out%> %
group_by(day)%>%
filter(datetime == max(datetime))%>%
ungroup()%>%
mutate(day = day + 1)

首先< - out%>%
group_by(day)%>%
filter(datetime == min(datetime))%> %
ungroup()%>%
mutate(day = day - 1)

d< - rbind(out,last,first)%>%
group_by(day)%>%
arrange(datetime)

n_measures_per_day< - d%>%
总汇(n = n())

d< - left_join(d,n_measures_per_day)%>%
过滤器(n> 4)

TotalFluxDF< - d%>%
mutate (timeAtMidnight = floor_date(datetime [3],day),
time = datetime - timeAtMidnight)%>%
总结(auc = auc(x = time,y = Flux,from = 0 ,to = 1440,type =spline))

TotalFluxDF

源:本地数据帧[ 3 x 2]

day auc
(dbl)(dbl)
1 8 585230.2
2 9 579017.3
3 11 563689.7


I've measured N20 flux from soil at multiple timepoints in the day (not equally spaced). I'm trying to calculate the total N20 flux from soil for a subset of days by finding the area under the curve for the given day. I know how to do this when using only measures from the given day, however, I'd like to include the last measure of the previous day and the first measure of the following day to improve the estimation of the curve.

Here's an example to give a more concrete idea:

library(MESS)
library(lubridate)
library(dplyr)

Generate Reproducible Example

datetime <- seq(ymd_hm('2015-04-07 11:20'),ymd('2015-04-13'), by = 'hours')
dat <- data.frame(datetime, day = day(datetime), Flux = rnorm(n = length(datetime), mean = 400, sd = 20))

useDate <- data.frame(day = c(7:12), DateGood = c("No", "Yes", "Yes", "No", "Yes", "No"))
  dat <- left_join(dat, useDate)

Some days are "bad" (too many missing measures) and some are "Good" (usable). The goal is to filter all measurements (rows) that occurred on a "Good" day as well as the last measurement from the day before and the first measurement on the next day.

  out <- dat %>%
      mutate(lagDateGood = lag(DateGood),
             leadDateGood = lead(DateGood)) %>%
      filter(lagDateGood != "No" | leadDateGood != "No")

Now I need to calculate the area under the curve - this is not correct

out2 <- out %>%
    group_by(day) %>%
    mutate(hourOfday = hour(datetime) + minute(datetime)/60) %>%
    summarize(auc = auc(x = hourOfday, y = Flux, from = 0, to = 24, type = "spline"))

The trouble is that I don't include the measurements on end of previous day and start of following day when calculating AUC. Also, I get an estimate of flux for day 10, which is a "bad" day.

I think the crux of my question has to do with groups. Some measurements need to be in multiple groups (for example the last measurement on day 8 would be used in estimating AUC for day 8 and day 9). Do you have suggestions for how I could form new groups? Or might there be a completely different way to achieve the goal?

解决方案

Here's another way. More in line with the suggestions of @Alex Brown.

 # Another way
last <- out %>%
    group_by(day) %>%
    filter(datetime == max(datetime)) %>%
    ungroup() %>%
    mutate(day = day + 1)

first <- out %>%
    group_by(day) %>%
    filter(datetime == min(datetime)) %>%
    ungroup() %>%
    mutate(day = day - 1)

d <- rbind(out, last, first) %>%
    group_by(day) %>%
    arrange(datetime)

n_measures_per_day <- d %>%
    summarize(n = n())

d <- left_join(d, n_measures_per_day) %>%
    filter(n > 4)

TotalFluxDF <- d %>%
    mutate(timeAtMidnight = floor_date(datetime[3], "day"),
           time = datetime - timeAtMidnight) %>%
    summarize(auc = auc(x = time, y = Flux, from = 0, to = 1440, type = "spline"))

TotalFluxDF

Source: local data frame [3 x 2]

    day      auc
  (dbl)    (dbl)
1     8 585230.2
2     9 579017.3
3    11 563689.7

这篇关于在R中,是否可以在多个组中包括相同的行,还是有其他解决方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆