R 查找数据帧中落在给定阈值内的第一个值 [英] R finding the first value in a data frame that falls within a given threshold

查看:22
本文介绍了R 查找数据帧中落在给定阈值内的第一个值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是一个相当新的用户,我需要你的帮助来完成我坚持的任务.如果我的问题之前有人问过/回答过,如果您能指导我进入相关页面,我将不胜感激.

我有以下数据集 (lbnp_br),它是随时间(以秒为单位)测量的光密度 (OD):

 时间 OD1891 -244.61891.5 -244.41892 -2421892.5 -2421893 -241.11893.5 -242.41894 -245.21894.5 -249.6**1895 -253.9**1895.5 -254.51896 -251.91896.5 -246.71897 -242.41897.5 -234.61898 -225.5

我需要通过测量达到光密度阈值所需的时间来了解研究设备的响应情况.

为此,我计算了 OD 的变异系数 (CV),并使用平均 OD (-252.9098) +/- 2*CV 来定义响应阈值.对于上述数据,阈值设置为(平均 OD + 2*CV = -252.9917)和(平均 OD - 2*CV = -252.8278).

我现在需要计算从开始(1891 秒)到超过 +/- 阈值的第一个 OD 值的时间(以秒为单位).例如,对于上述数据帧,该阈值在 1895 秒时超过,对应于 -253.9 的 OD.

我现在必须对每个研究科目和总共 17 个科目重复 3 次,因此,我正在寻找一个函数,我可以在其中定义数据框和阈值,并且它将返回第一个 OD 值超过定义的阈值 (all_threshold$sup_2_minus) 和 (all_threshold$sup_2_plus) 及其相应的时间.

我在别处尝试了 subset 建议:

subset(lbnp_br,lbnp_br$OD < all_threshold$sup_2_minus & lbnp_br$OD > all_threshold$sup_2_plus)

但是,这不会返回我正在寻找的内容.

还有

ifelse(lbnp_br$OD > all_threshold$sup_2_plus & lbnp_br$OD < all_threshold$sup_2_minus, lbnp_br$OD, NA)

返回 NA 并且没有指定 OD 的确切值和时间.

解决方案

使用上面的代码,我添加了一些额外的条件来准确地得到我想要的东西,这里是为任何可能需要类似东西的人提供的:

>

find_time <- function(df, df2, df3, threshold_1, threshold_2, threshold_3, threshold_4, threshold_5, threshold_6){return_value_1 = df%>%安排(时间)%>%过滤器(OD>阈值_1)%>%切片_(1)colnames(return_value_1)[1] <- "time_hdt_upper"colnames(return_value_1)[2] <- "OD_hdt_upper"如果(nrow(return_value_1)== 0){return_value_1[1,1] <- NAreturn_value_1[1,2] <- NA}return_value_2 = df%>%安排(时间)%>%过滤器(OD <阈值_2)%>%切片_(1)colnames(return_value_2)[1] <- "time_hdt_lower"colnames(return_value_2)[2] <- "OD_hdt_lower"如果(nrow(return_value_2)== 0){return_value_2[1,1] <- NAreturn_value_2[1,2] <- NA}return_value_3 = df2 %>%安排(时间)%>%过滤器(OD > threshold_3) % > %切片_(1)colnames(return_value_3)[1] <- "time_lbnp_upper"colnames(return_value_3)[2] <- "OD_lbnp_upper"如果(nrow(return_value_3)== 0){return_value_3[1,1] <- 不适用return_value_3[1,2] <- NA}return_value_4 = df2 %>%安排(时间)%>%过滤器(OD <阈值_4)%>%切片_(1)colnames(return_value_4)[1] <- "time_lbnp_lower"colnames(return_value_4)[2] <- "OD_lbnp_lower"如果(nrow(return_value_4)== 0){return_value_4[1,1] <- NAreturn_value_4[1,2] <- NA}return_value_5 = df3 %>%安排(时间)%>%过滤器(OD>阈值_5)%>%切片_(1)colnames(return_value_5)[1] <- "time_hut_upper"colnames(return_value_5)[2] <- "OD_hut_upper"如果(nrow(return_value_5)== 0){return_value_5[1,1] <- NAreturn_value_5[1,2] <- NA}return_value_6 = df3 %>%安排(时间)%>%过滤器(OD <阈值_6)%>%切片_(1)colnames(return_value_6)[1] <- "time_hut_lower"colnames(return_value_6)[2] <- "OD_hut_lower"如果(nrow(return_value_6)== 0){return_value_6[1,1] <- NAreturn_value_6[1,2] <- NA}返回(数据.框架(返回值_1,返回值_2,返回值_3,返回值_4,返回值_5,返回值_6))}

给出

find_time_threshold <- find_time(hdt_br, lbnp_br, hut_br, all_threshold$base_plus, all_threshold$base_minus, all_threshold$sup_2_plus, all_threshold$sup_2_minus, all_threshold$sup_sup_sup_3_>find_time_thresholdtime_hdt_upper OD_hdt_upper time_hdt_lower OD_hdt_lower time_lbnp_upper OD_lbnp_upper time_lbnp_lower1 596.5 123.3 506 91.3 不适用 不适用 1706OD_lbnp_lower time_hut_upper OD_hut_upper time_hut_lower OD_hut_lower1 -27.89 3186.5 -82.98 2909 -211.7

I am a fairly new user and I need your help with a task that I am stuck on. If my question has been asked/answered before I would be grateful if you could kindly guide me to the relevant page.

I have the following data set (lbnp_br) which is optical density (OD) measured over time (in seconds):

 time   OD
1891    -244.6
1891.5  -244.4
1892    -242
1892.5  -242
1893    -241.1
1893.5  -242.4
1894    -245.2
1894.5  -249.6
**1895  -253.9**
1895.5  -254.5
1896    -251.9
1896.5  -246.7
1897    -242.4
1897.5  -234.6
1898    -225.5

I need to find out how responsive the study device is by measuring how long it takes to reach the threshold for optical density.

For this I have calculated the coefficient of variation (CV) of OD and I am using mean OD (-252.9098) +/- 2*CV to define a response threshold. For the above data the threshold is set as (mean OD + 2*CV = -252.9917), and (mean OD - 2*CV = -252.8278).

I now need to calculate the time in seconds from the start (1891 seconds) to the first OD value that exceed the +/- threshold values. For example for the above data frame this threshold is exceeded at 1895 seconds corresponding to an OD of -253.9.

I now have to repeat this 3 times for each study subject and 17 subjects overall, thus, I am looking for a function where I can define the data frame and the threshold values, and it will return the first OD value where it exceeds the defined thresholds (all_threshold$sup_2_minus) and (all_threshold$sup_2_plus) and its corresponding time.

I have tried subset a advised elsewhere:

subset(lbnp_br, lbnp_br$OD < all_threshold$sup_2_minus & lbnp_br$OD > all_threshold$sup_2_plus)  

However, this doesn't return what I am looking for.

and also

ifelse(lbnp_br$OD > all_threshold$sup_2_plus & lbnp_br$OD < all_threshold$sup_2_minus, lbnp_br$OD, NA)

which returns NA and doesn't specify the exact value of OD and the time.

解决方案

Using the above code, I added a few extra conditions to get exactly what I was looking for and here it is for anyone who may need something similar:

find_time <- function(df, df2, df3, threshold_1, threshold_2, threshold_3, threshold_4, threshold_5, threshold_6){
  return_value_1 = df %>%
    arrange(time) %>%
    filter(OD > threshold_1) %>%
    slice_(1)
  colnames(return_value_1)[1] <- "time_hdt_upper"
  colnames(return_value_1)[2] <- "OD_hdt_upper"

  if (nrow(return_value_1) == 0) {
    return_value_1[1,1] <- NA
    return_value_1[1,2] <- NA
  }


  return_value_2 = df %>%
    arrange(time) %>%
    filter(OD < threshold_2) %>%
    slice_(1)
  colnames(return_value_2)[1] <- "time_hdt_lower"
  colnames(return_value_2)[2] <- "OD_hdt_lower"

  if (nrow(return_value_2) == 0) {
    return_value_2[1,1] <- NA
    return_value_2[1,2] <- NA
  }

  return_value_3 = df2 %>%
    arrange(time) %>%
    filter(OD > threshold_3) %>%
    slice_(1)
  colnames(return_value_3)[1] <- "time_lbnp_upper"
  colnames(return_value_3)[2] <- "OD_lbnp_upper"

  if (nrow(return_value_3) == 0) {
    return_value_3[1,1] <- NA
    return_value_3[1,2] <- NA
  }


  return_value_4 = df2 %>%
    arrange(time) %>%
    filter(OD < threshold_4) %>%
    slice_(1)
  colnames(return_value_4)[1] <- "time_lbnp_lower"
  colnames(return_value_4)[2] <- "OD_lbnp_lower"

  if (nrow(return_value_4) == 0) {
    return_value_4[1,1] <- NA
    return_value_4[1,2] <- NA
  }



  return_value_5 = df3 %>%
    arrange(time) %>%
    filter(OD > threshold_5) %>%
    slice_(1)
  colnames(return_value_5)[1] <- "time_hut_upper"
  colnames(return_value_5)[2] <- "OD_hut_upper"

  if (nrow(return_value_5) == 0) {
    return_value_5[1,1] <- NA
    return_value_5[1,2] <- NA
  }



  return_value_6 = df3 %>%
    arrange(time) %>%
    filter(OD < threshold_6) %>%
    slice_(1)
  colnames(return_value_6)[1] <- "time_hut_lower"
  colnames(return_value_6)[2] <- "OD_hut_lower"

  if (nrow(return_value_6) == 0) {
    return_value_6[1,1] <- NA
    return_value_6[1,2] <- NA
  }



  return(data.frame(return_value_1, return_value_2, return_value_3, return_value_4, return_value_5, return_value_6))


}

which gives

find_time_threshold <- find_time(hdt_br, lbnp_br, hut_br, all_threshold$base_plus, all_threshold$base_minus, all_threshold$sup_2_plus, all_threshold$sup_2_minus, all_threshold$sup_3_plus, all_threshold$sup_3_minus)
> find_time_threshold

  time_hdt_upper OD_hdt_upper time_hdt_lower OD_hdt_lower time_lbnp_upper OD_lbnp_upper time_lbnp_lower
1          596.5        123.3            506         91.3              NA            NA            1706
  OD_lbnp_lower time_hut_upper OD_hut_upper time_hut_lower OD_hut_lower
1        -27.89         3186.5       -82.98           2909       -211.7

这篇关于R 查找数据帧中落在给定阈值内的第一个值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆