行中的条件标签 [英] conditional labeling in rows

查看：54 发布时间：2020/6/3 20:18:06 r algorithm dplyr

本文介绍了行中的条件标签的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想根据其他行中的条件来标记行。

I would like to label rows based on the condition in other rows.

基本上，我要查找的是如果行是 NA 然后查找具有非NA的行，并使用其 sd_value 列决定是否用其标签标记NA行，否则用NA对其进行标记。我希望这个解释简单明了。

basically, what I look for is if the row is NA then look for row with non-NA and use its sd_value column to decide to whether label the NA row with its label else label it with NA. I hope this explanation is straightforward.

所以可以说我们有

df <- data.frame(value = c(0.5,1,0.6,1.2), sd_value=c(0.1,0.5,0.2,0.8),
             label = c("good", "bad",NA,NA))


> df
  value sd_value label
1   0.5      0.1  good
2   1.0      0.1   bad
3   0.6      0.5    NA
4   1.2      0.8    NA

例如要标记第3行，我需要检查该行的值，然后检查它们是否位于之间好 或差 值±2 * sd_value。如果这样，则将其标记为好或坏。

to label for example row 3, I need to check that row value and then check whether or not they lie in between 'good' or 'bad' value±2*sd_value. If so label them good or bad.

预期产出

> df
  value sd_value label
1   0.5      0.1  good
2   1.0      0.1   bad
3   0.6      0.5  good   #because 0.6 is ±2*sd_value of 1st row value 
4   1.2      0.8   bad   #because 1.2 is ±2*sd_value of 2nd row value

更广泛地说这个问题我们有这样的数据

to generalise the question more lets say we have a data like this

df <- data.frame(value = c(0.5, 1,8, 1.2, 2.4,0.4,6,2,5.7, 9),   
                 sd_value=c(0.1, 0.1,1, 0.2,0.2,0.1,0.4,0.2,0.1,0.1),
                 label = c("good",NA,"beautiful","bad", NA,NA,"ugly","dirty",NA,NA))


> df
   value sd_value     label
1    0.5      0.1      good
2    1.0      0.1      <NA>
3    8.0      1.0 beautiful
4    1.2      0.2       bad
5    2.4      0.2      <NA>
6    0.4      0.1      <NA>
7    6.0      0.4      ugly
8    2.0      0.2     dirty
9    5.7      0.1      <NA>
10   9.0      0.1      <NA>

根据条件，预期输出应为

Based on the conditions the expected output should look like

> df
   value sd_value     label
1    0.5      0.1      good #original label
2    1.0      0.1      bad
3    8.0      1.0      beautiful #original label
4    1.2      0.2      bad
5    2.4      0.2      dirty
6    0.4      0.1      good
7    6.0      0.4      ugly #original label
8    2.0      0.2      dirty #original label
9    5.7      0.1      ugly 
10   9.0      0.1      beautiful

根据±2 * sd_value 非NA行值。


推荐答案
我们可以对 NA 行'value's并检查与'good''label对应的'value'，'sd'，通过数字索引或使用<$将逻辑向量（'i2'）更改为'good / bad' c $ c> ifelse 并根据索引（'i1'）将输出分配回该列
We can subset the NA row 'value's and check that with the 'value', 'sd' corresponding to the 'good' 'label, change the logical vector ('i2') to 'good/bad' either with numeric indexing or using ifelse and assign the output back to the column based on the index ('i1')
i1 <- is.na(df$label)
i2 <- df$value[i1] < abs(df$value[1] + 2 * df$sd_value[1])
df$label[i1] <- c("bad", "good")[(i2 + 1)]

 
 
 
 
 
 可以包装在函数中




It can be wrapped in a function
f1 <- function(data, lblCol, valCol, sdCol){
     i1 <- is.na(df[[lblCol]])
     gd <- which(df[[lblCol]] == "good")
     i2 <- df[[valCol]][i1] < abs(df[[valCol]][gd] + 2 * df[[sdCol]][gd])
     df[[lblCol]][i1] <- c("bad", "good")[(i2 + 1)]
     df
  }

f1(df, "label", "value", "sd_value")
#  value sd_value label
#1   0.5      0.1  good
#2   1.0      0.5   bad
#3   0.6      0.2  good
#4   1.2      0.8   bad

 
 
 
更新
 
 
 使用更新的数据集，我们提取标签为非NA的行，排列升序排列，并在 cut 中使用它来剪切值以获得正确的标签 


Update

With the updated dataset, we extract the rows where the 'label' is non-NA, arrange it in ascending order and use that in cut to cut the 'value' to get the correct 'label'
library(dplyr) 
df1 <- df %>% 
      filter(!is.na(label)) %>% 
      transmute(label, v1 = value + 2 * sd_value) %>%
      arrange(v1)
df %>% 
    mutate(label = cut(value, breaks = c(-Inf, df1$v1), labels = df1$label)) 
#   value sd_value     label
#1    0.5      0.1      good
#2    1.0      0.1       bad
#3    8.0      1.0 beautiful
#4    1.2      0.2       bad
#5    2.4      0.2     dirty
#6    0.4      0.1      good
#7    6.0      0.4      ugly
#8    2.0      0.2     dirty
#9    5.7      0.1      ugly
#10   9.0      0.1 beautiful

 
 
 
 
 
 或 base R  
df1 <- transform(na.omit(df), v1 = value + 2 * sd_value)[3:4]
df$label <- cut(df$value,  breaks = c(-Inf, df1$v1), labels = df1$label)


                        这篇关于行中的条件标签的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

行中的条件标签 [英] conditional labeling in rows

问题描述

推荐答案

更新

Update

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

行中的条件标签 [英] conditional labeling in rows

问题描述

推荐答案

更新

Update

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭