使用 R 中的逻辑函数,用应用族函数(或 dplyr)替换循环 [英] replace loops with apply family functions (or dplyr), using logical functions in R

查看:23
本文介绍了使用 R 中的逻辑函数,用应用族函数(或 dplyr)替换循环的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我创建了这个具有代表性的数据框,它使用 for 循环分配条件类别.

I have created this representative data frame that assigns condition categories using a for loop.

df <- data.frame(Date=c("08/29/2011", "08/29/2011", "08/30/2011", "08/30/2011", "08/30/2011", "08/29/2012", "08/29/2012", "01/15/2012", "08/29/2012"),
             Time=c("09:45", "10:00", "13:00", "13:30", "10:14", "9:09", "11:23", "17:06", "12:20"),
             Diff = c(0.2,4.3,6.5,15.0, 16.5, 31, 30.2, 21.9, 1.9))

df1<- df %>%
  mutate(Accuracy=ifelse(Diff<=3, "Excellent", "TBD"))

for(i in 1:nrow(df1)){
  if(df1$Diff[i]>3&&df1$Diff[i]<=10){
    df1$Accuracy[i]<-"Good"} 
  if(df1$Diff[i]>10&&df1$Diff[i]<=15){
    df1$Accuracy[i]<-"Fair"} 
  if(df1$Diff[i]>15&&df1$Diff[i]<=30){
    df1$Accuracy[i]<-"Poor"}
  if(df1$Diff[i]>30){
    df1$Accuracy[i]<-"Unacceptable"}
}

我的实际数据集非常大,阅读表明 for 循环通常不是在 R 中编码的最有效方式.我相信我可以通过为每个条件创建一个逻辑向量来做同样的事情,并且在每个向量中 TRUE 是什么时候每个条件都满足.然后,我可以通过子集分配值,例如 df1$Accuracy[Good]<-"Good".但是,我无法弄清楚如何使用 apply family 函数或 dplyr 函数创建逻辑向量.(但是,任何避免 for 循环的解决方案也是受欢迎的.)如果 for 循环是更好的方法,那么了解它也会有所帮助.

My actual dataset is very large and reading indicates for loops are usually not the most efficient way to code in R. I believe I can do the same thing by creating a logical vector for each condition, and within each vector TRUE is when each condition is met. Then, I can assign the values by subsetting, df1$Accuracy[Good]<-"Good" for example. However, I can not figure out how to create the logical vector using the apply family functions or dplyr functions. (But, any solution that avoids for loops is also welcome.) If for loops are the better way to go, that would also be helpful to know.

这是我失败的尝试.这些返回不正确的 NA 或不正确的逻辑向量.我不明白的许多事情之一是 lapply 如何知道遍历列或行.

Here are my failed attempts. These return incorrect NA's or incorrect logical vectors. One of the many things I do not understand is how lapply knows to go over columns or rows.

Good<-apply(df1, 1, function(x) ifelse(df1$Diff[x]>3&& df1$Diff[x]<=10, TRUE, FALSE)) #logical, TRUE where condition is true 
Good<-unlist(lapply(df1$Diff,  function(x) {(ifelse(df1$Diff[x]>3&& df1$Diff[x]<=10, TRUE, FALSE))}))

更新:嵌套 ifelse 语句将起作用,但仍然欢迎任何有关如何使用 apply 的建议.

Update: Nested ifelse statements will work, but any suggestions on how to use apply are still welcome.

mutate(Accuracy=ifelse(pDiff<=3, "Excellent", 
                         ifelse(pDiff>3&pDiff<=10, "Good",
                                ifelse(pDiff>10&pDiff<=15, "Fair",
                                       ifelse(pDiff>15&pDiff<30, "Poor",
                                              ifelse(Diff>30, "Unpublishable", "TBD"))))))  

推荐答案

You can use case_when from dplyr:

You could use case_when from dplyr:

df1<- df %>%
mutate(Accuracy= case_when(
  .$Diff <=  3 ~ "Excellent",
  .$Diff <=  10  ~ "Good",
  .$Diff <=  15  ~ "Fair",
  .$Diff <=  30  ~ "Poor",
  .$Diff >   30  ~ "Unpublishable",
  TRUE  ~"TBD")
)

 df1
        Date  Time Diff      Accuracy
1 08/29/2011 09:45  0.2     Excellent
2 08/29/2011 10:00  4.3          Good
3 08/30/2011 13:00  6.5          Good
4 08/30/2011 13:30 15.0          Fair
5 08/30/2011 10:14 16.5          Poor
6 08/29/2012  9:09 31.0 Unpublishable
7 08/29/2012 11:23 30.2 Unpublishable
8 01/15/2012 17:06 21.9          Poor
9 08/29/2012 12:20  1.9     Excellent

这篇关于使用 R 中的逻辑函数,用应用族函数(或 dplyr)替换循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆