R-从排序数据中构建新变量 [英] R - building new variables from sequenced data

查看：68 发布时间：2020/10/15 20:42:11 r dataframe data.table sequence

本文介绍了R-从排序数据中构建新变量的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这是这个问题。答案概述了他们不符合新要求。

This is an update / follow-up on this question. The answer outlined their doesn't meet the new requirements.

我正在寻找一种有效的方法（ data.table ？）为每个 ID 。

I am looking for an efficient way (data.table?) to construct two new measures for each ID.

度量1和度量2需要满足以下条件：

Measure 1 and Measure 2 needs to meet the following conditions:

条件1：
查找一个三行序列，其中：

Condition 1: Find a sequence of three rows for which:

第一个计数> 0

第二个`count> 1'和

第三个 count == 1 。

the first count > 0
the second `count >1' and
the third count ==1.

度量1的条件2：

获取序列第三行的乘积中元素的值，即：

产品序列的第二行，

在序列的第一行的库存中不包含。

takes the value of the elements in product of the third row of the sequence that are:
in the product of second row of sequence and
NOT in the stock of the first row in sequence.

度量2的条件2：

获取序列最后一行的乘积中元素的值，即：

<$ c $中不序列第二行的c> product

在序列第一行的库存中不存在。

takes the value of the elements in product of the last row of the sequence that are:
NOT in the product of second row of sequence
NOT in the stock of the first row in sequence.

数据：

df2 <- data.frame(ID = c(1,1,1,1,1,1,1,2,2,2,3,3,3,3),
              seqs = c(1,2,3,4,5,6,7,1,2,3,1,2,3,4),
              count = c(2,1,3,1,1,2,3,1,2,1,3,1,4,1),
              product = c("A", "B", "C", "A,C,E", "A,B", "A,B,C", "D", "A", "B", "A", "A", "A,B,C", "D", "D"),
              stock = c("A", "A,B", "A,B,C", "A,B,C,E", "A,B,C,E", "A,B,C,E", "A,B,C,D,E", "A", "A,B", "A,B", "A", "A,B,C", "A,B,C,D", "A,B,C,D"))

> df2
   ID seqs count product     stock
1   1    1     2       A         A
2   1    2     1       B       A,B
3   1    3     3       C     A,B,C
4   1    4     1   A,C,E   A,B,C,E
5   1    5     1     A,B   A,B,C,E
6   1    6     2   A,B,C   A,B,C,E
7   1    7     3       D A,B,C,D,E
8   2    1     1       A         A
9   2    2     2       B       A,B
10  2    3     1       A       A,B
11  3    1     3       A         A
12  3    2     1   A,B,C     A,B,C
13  3    3     4       D   A,B,C,D
14  3    4     1       D   A,B,C,D

所需的输出如下：

   ID seq1 seq2 seq3 measure1   measure2
1:  1    2    3    4   C         E 
2:  2    1    2    3    
3:  3    2    3    4   D

您将如何编码？

推荐答案

要做到这一点，您需要了解的几件事：

Few things you need to know to be able to do this:

shift 函数比较组中的值

separate_rows 函数可拆分字符串以进入标准化数据视图。

shift function to compare values in your groups
separate_rows function to split your strings to get to the normalised data view.

library(data.table)
dt <- data.table(ID = c(1,1,1,1,1,1,1,2,2,2,3,3,3,3),
                  seqs = c(1,2,3,4,5,6,7,1,2,3,1,2,3,4),
                  count = c(2,1,3,1,1,2,3,1,2,1,3,1,4,1),
                  product = c("A", "B", "C", "A,C,E", "A,B", "A,B,C", "D", "A", "B", "A", "A", "A,B,C", "D", "D"),
                  stock = c("A", "A,B", "A,B,C", "A,B,C,E", "A,B,C,E", "A,B,C,E", "A,B,C,D,E", "A", "A,B", "A,B", "A", "A,B,C", "A,B,C,D", "A,B,C,D"))

dt[, count.2 := shift(count, type = "lead")]
dt[, count.3 := shift(count, n = 2, type = "lead")]

dt[, product.2 := shift(product, type = "lead")]
dt[, product.3 := shift(product, n = 2, type = "lead")]


dt <- dt[count > 0 & count.2 > 1 &  count.3 == 1]
dt <- unique(dt, by = "ID")

library(tidyr)
dt.measure <- separate_rows(dt, product.3, sep = ",")
dt.measure <- separate_rows(dt.measure, stock, sep = ",")
dt.measure <- separate_rows(dt.measure, product, sep = ",")

dt.measure[, measure.1 := (product.3 == product.2 & product.3 != stock)]
dt.measure[, measure.2 := (product.3 != product.2 & product.3 != stock)]
res <- dt.measure[, 
  .(
    measure.1 = max(ifelse(measure.1, product.3, NA_character_), na.rm = TRUE), 
    measure.2 = max(ifelse(measure.2, product.3, NA_character_), na.rm = TRUE)
  ),
  ID
]

dt <- merge(dt, res, by = "ID")
dt[, .(ID, measure.1, measure.2)]
# ID measure.1 measure.2
# 1:  1         C         E
# 2:  2      <NA>      <NA>
# 3:  3         D      <NA>

这篇关于R-从排序数据中构建新变量的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

R-从排序数据中构建新变量 [英] R - building new variables from sequenced data

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

R-从排序数据中构建新变量 [英] R - building new variables from sequenced data

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭