用复杂的逻辑连接R中的特征矩阵和目标向量 [英] Joining feature matrix and target vector in R with complicated logic

查看:31
本文介绍了用复杂的逻辑连接R中的特征矩阵和目标向量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个这样的特征向量:

I have a feature vector like this:

   rest_id qtr cooking cleaning eating jumping
1      123   1   FALSE     TRUE  FALSE   FALSE
2      123   2   FALSE     TRUE  FALSE   FALSE
3      123   3   FALSE     TRUE  FALSE   FALSE
4      123   4   FALSE     TRUE  FALSE   FALSE
5      435   1   FALSE     TRUE  FALSE   FALSE
6      435   2   FALSE     TRUE  FALSE   FALSE
7      435   3   FALSE     TRUE  FALSE   FALSE
8      435   4   FALSE     TRUE  FALSE   FALSE
9      437   1   FALSE     TRUE  FALSE   FALSE
10     437   2   FALSE     TRUE  FALSE   FALSE
11     437   3   FALSE     TRUE  FALSE    TRUE
12     437   4   FALSE     TRUE  FALSE   FALSE
13     439   2   FALSE     TRUE   TRUE   FALSE

还有一个像这样的目标向量:

And a target vector like this:

   rest_id qtr target
1      123   1   TRUE
2      123   2  FALSE
3      123   3  FALSE
4      123   4   TRUE
5      123   5   TRUE
6      435   1   TRUE
7      435   2   TRUE
8      435   3   TRUE
9      435   4  FALSE
10     435   5  FALSE
11     437   1   TRUE
12     437   2   TRUE
13     437   3   TRUE
14     437   4  FALSE
15     439   3  FALSE

我想把这两者结合在一起

I want to join these two together such that

  • 功能 Q1 ->目标 Q1Q2

  • Feature Q1 -> Target Q1Q2

功能 Q2 ->目标 Q2Q3

Feature Q2 -> Target Q2Q3

功能 Q3 ->目标 Q3Q4

Feature Q3 -> Target Q3Q4

功能 Q4 ->目标 Q4Q5

Feature Q4 -> Target Q4Q5

例如,如果特征观察在第 1 季度,我们检查目标向量的第 1 和第 2 季度的 rest_idquarter :如果它们都为 TRUE,则target 变为 TRUE,如果它们都为 FALSE,则目标变为 FALSE,如果它们为 TRUE 和 FALSE,则目标变为 TRUE.

For example if the feature observation is in quarter 1, we check quarter 1 and 2 of the target vector for that rest_id and quarter : if they are both TRUE the target becomes TRUE, if they are both FALSE the target becomes FALSE, and if they are TRUE and FALSE they the target becomes TRUE.

预期的输出如下所示:

rest_id  qtr cooking cleaning eating jumping target
123      1   FALSE   TRUE     FALSE  FALSE   TRUE
123      2   FALSE   TRUE     FALSE  FALSE   FALSE
123      3   FALSE   TRUE     FALSE  FALSE   TRUE
123      4   FALSE   TRUE     FALSE  FALSE   TRUE
435      1   FALSE   TRUE     FALSE  FALSE   TRUE
435      2   FALSE   TRUE     FALSE  FALSE   TRUE
435      3   FALSE   TRUE     FALSE  FALSE   TRUE
435      4   FALSE   TRUE     FALSE  FALSE   FALSE
437      1   FALSE   TRUE     FALSE  FALSE   TRUE
437      2   FALSE   TRUE     FALSE  FALSE   TRUE
437      3   FALSE   TRUE     FALSE  FALSE   TRUE
437      4   FALSE   TRUE     FALSE  FALSE   FALSE

由于我提到的复杂逻辑,我无法仅通过 R 中的常规连接来完成此操作.最简单的方法是什么?

I cant do this with just a regular join in R because of the complicated logic I mentioned. What is the easiest way to do this?

谢谢!

在某些情况下,目标不存在一个季度.我添加了一个 rest_id 为 437 的示例.例如,如果特征向量实例是 Q4,我们检查 Q4 和 Q5.Q5 不存在,所以我们只使用 Q4.如果两者都不存在,那么它应该是 NA.

there are some cases where the target doesn't exist for a quarter. I added an example where the rest_id is 437. If for example the feature vector instance is Q4, we check for Q4 and Q5. Q5 doesn't exist so we just use Q4. If both do not exist then it should be NA.

推荐答案

我想这就是你想要的:

library(dplyr)

dat %>% 
  complete(qtr, rest_id) %>%
  group_by(rest_id) %>%
  mutate(target = as.logical(pmax(target, lead(target), na.rm = TRUE))) %>%
  right_join(dat2, by = c("rest_id", "qtr")) %>%
  relocate(target, .after = last_col()) %>%
  arrange(rest_id)

# A tibble: 13 x 7
# Groups:   rest_id [4]
     qtr rest_id cooking cleaning eating jumping target
   <int>   <int> <lgl>   <lgl>    <lgl>  <lgl>   <lgl> 
 1     1     123 FALSE   TRUE     FALSE  FALSE   TRUE  
 2     2     123 FALSE   TRUE     FALSE  FALSE   FALSE 
 3     3     123 FALSE   TRUE     FALSE  FALSE   TRUE  
 4     4     123 FALSE   TRUE     FALSE  FALSE   TRUE  
 5     1     435 FALSE   TRUE     FALSE  FALSE   TRUE  
 6     2     435 FALSE   TRUE     FALSE  FALSE   TRUE  
 7     3     435 FALSE   TRUE     FALSE  FALSE   TRUE  
 8     4     435 FALSE   TRUE     FALSE  FALSE   FALSE 
 9     1     437 FALSE   TRUE     FALSE  FALSE   TRUE  
10     2     437 FALSE   TRUE     FALSE  FALSE   TRUE  
11     3     437 FALSE   TRUE     FALSE  TRUE    TRUE  
12     4     437 FALSE   TRUE     FALSE  FALSE   FALSE 
13     2     439 FALSE   TRUE     TRUE   FALSE   FALSE 

数据:

dat <- structure(list(rest_id = c(123L, 123L, 123L, 123L, 123L, 435L, 
435L, 435L, 435L, 435L, 437L, 437L, 437L, 437L, 439L), qtr = c(1L, 
2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 3L), target = c(TRUE, 
FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, TRUE, 
TRUE, TRUE, FALSE, FALSE)), class = "data.frame", row.names = c(NA, 
-15L))

dat2 <- structure(list(rest_id = c(123L, 123L, 123L, 123L, 435L, 435L, 
435L, 435L, 437L, 437L, 437L, 437L, 439L), qtr = c(1L, 2L, 3L, 4L, 
1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L,2L), cooking = c(FALSE, FALSE, FALSE, 
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE
), cleaning = c(TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, 
TRUE, TRUE, TRUE, TRUE, TRUE), eating = c(FALSE, FALSE, FALSE, FALSE, 
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE), jumping = c(FALSE, 
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE,
FALSE, FALSE)), class = "data.frame", row.names = c(NA, -13L))

这篇关于用复杂的逻辑连接R中的特征矩阵和目标向量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆