R中的复杂特征矩阵和目标向量连接 [英] complicated feature matrix and target vector join in R

查看:43
本文介绍了R中的复杂特征矩阵和目标向量连接的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在这里问了一个类似的问题:R 中特征矩阵和目标向量的复杂逻辑连接

I asked a similar question here: Joining feature matrix and target vector in R with complicated logic

但由于我的困惑和不清楚的提示,我创建了一个新问题.

But I've created a new question due to the confusion and unclear prompt I had.

我有一个这样的特征向量:

I have a feature vector like this:

   rest_id qtr cooking cleaning eating jumping
1      123   1   FALSE     TRUE  FALSE   FALSE
2      123   2   FALSE     TRUE  FALSE   FALSE
3      123   3   FALSE     TRUE  FALSE   FALSE
4      123   4   FALSE     TRUE  FALSE   FALSE
5      435   1   FALSE     TRUE  FALSE   FALSE
6      435   2   FALSE     TRUE  FALSE   FALSE
7      435   3   FALSE     TRUE  FALSE   FALSE
8      435   4   FALSE     TRUE  FALSE   FALSE
9      437   1   FALSE     TRUE  FALSE   FALSE
10     437   2   FALSE     TRUE  FALSE   FALSE
11     437   3   FALSE     TRUE  FALSE    TRUE
12     437   4   FALSE     TRUE  FALSE   FALSE
13     439   2   FALSE     TRUE   TRUE   FALSE
14     508   1   FALSE     TRUE   TRUE   FALSE
15     508   2   FALSE     TRUE   TRUE   FALSE
16     234   2   FALSE     TRUE   TRUE   FALSE

还有一个像这样的目标向量:

And a target vector like this:

   rest_id qtr target
1      123   1   TRUE
2      123   2  FALSE
3      123   3  FALSE
4      123   4   TRUE
5      123   5   TRUE
6      435   1   TRUE
7      435   2   TRUE
8      435   3   TRUE
9      435   4  FALSE
10     435   5  FALSE
11     437   1   TRUE
12     437   2   TRUE
13     437   3   TRUE
14     437   4  FALSE
15     439   3  FALSE
16     508   3  FALSE
17     508   5  FALSE
18     234   3  TRUE

我想把这两者结合在一起

I want to join these two together such that

  • 功能 Q1 ->目标 Q1Q2

  • Feature Q1 -> Target Q1Q2

功能 Q2 ->目标 Q2Q3

Feature Q2 -> Target Q2Q3

功能 Q3 ->目标 Q3Q4

Feature Q3 -> Target Q3Q4

功能 Q4 ->目标 Q4Q5

Feature Q4 -> Target Q4Q5

例如,如果特征观察在第 1 季度,我们检查目标向量的第 1 和第 2 季度的 rest_idquarter :如果它们都为 TRUE,则target 变为 TRUE,如果它们都为 FALSE,则目标变为 FALSE,如果它们为 TRUE 和 FALSE,则目标变为 TRUE.相同的逻辑适用于 Q2、Q3、Q4.

For example if the feature observation is in quarter 1, we check quarter 1 and 2 of the target vector for that rest_id and quarter : if they are both TRUE the target becomes TRUE, if they are both FALSE the target becomes FALSE, and if they are TRUE and FALSE they the target becomes TRUE. The same logic applies for Q2,Q3,Q4.

然而,目标向量中有一些缺失的四分之一.如果我们查看特征向量中的第 1 季度,我们会检查 Q1 和 Q3 的相同 rest_id 的目标.可能发生三种情况:

However there are some missing quarters in the target vector. If we are looking at quarter 1 in our feature vector, we check the target for the same rest_id for Q1 and Q3. There are three cases that can happen:

  • Q1 丢失,Q2 没有丢失 --->取 Q2 的目标值

  • Q1 is missing and Q2 is not missing ---> take the target value for Q2

Q2 没有丢失,Q1 没有丢失 --->取 Q1 的目标值

Q2 is not missing and Q1 is missing ---> take target value for Q1

Q1 和 Q2 都不见了 --->应该是 N/A

Q1 and Q2 are both missing ---> should be N/A

预期的输出如下所示:

rest_id  qtr cooking cleaning eating jumping target
123      1   FALSE   TRUE     FALSE  FALSE   TRUE
123      2   FALSE   TRUE     FALSE  FALSE   FALSE
123      3   FALSE   TRUE     FALSE  FALSE   TRUE
123      4   FALSE   TRUE     FALSE  FALSE   TRUE
435      1   FALSE   TRUE     FALSE  FALSE   TRUE
435      2   FALSE   TRUE     FALSE  FALSE   TRUE
435      3   FALSE   TRUE     FALSE  FALSE   TRUE
435      4   FALSE   TRUE     FALSE  FALSE   FALSE
437      1   FALSE   TRUE     FALSE  FALSE   TRUE
437      2   FALSE   TRUE     FALSE  FALSE   TRUE
437      3   FALSE   TRUE     FALSE  FALSE   TRUE
437      4   FALSE   TRUE     FALSE  FALSE   FALSE
439      2   FALSE   TRUE     FALSE  FALSE   FALSE
508      1   FALSE   TRUE     TRUE   FALSE   N/A
508      2   FALSE   TRUE     TRUE   FALSE   FALSE
234      2   FALSE   TRUE     TRUE   FALSE   TRUE

由于我提到的复杂逻辑,我无法仅通过 R 中的常规连接来完成此操作.最简单的方法是什么?

I cant do this with just a regular join in R because of the complicated logic I mentioned. What is the easiest way to do this?

谢谢!

推荐答案

tidyverse 方式(因为问题是用它标记的):

A tidyverse way (since the question is tagged with it):

library(tidyverse)

expand_grid(rest_id = unique(feature_vector$rest_id), qtr = 1:5) %>%
  arrange(rest_id, qtr) %>%
  left_join(target_vector) %>%
  group_by(rest_id) %>%
  mutate(lead_target = lead(target)) %>%
  mutate(aimed_target = case_when(!is.na(target) & is.na(lead_target) ~ target,
                                  is.na(target) & !is.na(lead_target) ~ lead_target,
                                  TRUE ~ target|lead_target)) %>%
  ungroup() %>%
  right_join(feature_vector) %>%
  select(rest_id, qtr, cooking, cleaning, eating, jumping, aimed_target) %>%
  rename(target = aimed_target)

  1. 首先,我创建了特征向量中所有 rest_id 的组合,以及使用 expand_grid()qtr>.然后我使用 arrange() 对网格进行排序(如果 rest_id 首先已经排序,这是多余的).

  1. First I create a combination of all rest_ids in the feature vector, and qtr from 1 to 5 using expand_grid(). Then I use arrange() to make the grid sorted (this is redundant if rest_id is already sorted in the first place).

然后我使用 left_join()target_vector 加入上述网格.前两个步骤完成后,每个缺失的 rest_idqtr 组合都会在 target 列中获得一个 NA 值>.

Then I use left_join() to join the target_vector to the aforementioned grid. These first two steps are done so that every missing rest_id and qtr combination is granted a NA value in the column target.

我创建列 lead_target,原因是因为您总是需要当前季度和下一个季度的 target 值.现在,我可以通过 lead() 使一行同时包含两者.在此之前,我使用 group_by() 所以 lead() 函数只在类似的 rest_id 上完成.

I create column lead_target, the reason is because you'll always want the current quarter and the next quarter's target value. Now, I can make one row have both via lead(). Before that I use group_by() so the lead() function is done on similar rest_ids only.

aimed_target 几乎是使用您指定的逻辑创建的.我使用 case_when() 作为多个 ifelse() 函数的替代.运算符 | 是或",以防万一.

aimed_target is pretty much created using logic that you specify. I use case_when() as a replacement to multiple ifelse() functions. The operator | is "or", in case you wonder.

其余的代码非常简单.我需要删除一些列并在最后重命名.

The rest of the code is pretty straightforward. I need to drop some columns and rename in the end.

这篇关于R中的复杂特征矩阵和目标向量连接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆