R-从排序数据中构建新变量 [英] R - building new variables from sequenced data
本文介绍了R-从排序数据中构建新变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
这是这个问题。答案概述了他们不符合新要求。
This is an update / follow-up on this question. The answer outlined their doesn't meet the new requirements.
我正在寻找一种有效的方法( data.table
?)为每个 ID
。
I am looking for an efficient way (data.table
?) to construct two new measures for each ID
.
度量1和度量2需要满足以下条件:
Measure 1 and Measure 2 needs to meet the following conditions:
条件1:
查找一个三行序列,其中:
Condition 1: Find a sequence of three rows for which:
- 第一个
计数> 0
- 第二个`count> 1'和
- 第三个
count == 1
。
- the first
count > 0
- the second `count >1' and
- the third
count ==1
.
度量1的条件2:
- 获取序列第三行的
乘积
中元素的值,即: -
产品
序列的第二行, - 在序列的第一行的
库存
中不包含。
- takes the value of the elements in
product
of the third row of the sequence that are: - in the
product
of second row of sequence and - NOT in the
stock
of the first row in sequence.
度量2的条件2:
- 获取序列最后一行的
乘积
中元素的值,即: - <$ c $中不序列第二行的c> product
- 在序列第一行的
库存
中不存在。
- takes the value of the elements in
product
of the last row of the sequence that are: - NOT in the
product
of second row of sequence - NOT in the
stock
of the first row in sequence.
数据:
df2 <- data.frame(ID = c(1,1,1,1,1,1,1,2,2,2,3,3,3,3),
seqs = c(1,2,3,4,5,6,7,1,2,3,1,2,3,4),
count = c(2,1,3,1,1,2,3,1,2,1,3,1,4,1),
product = c("A", "B", "C", "A,C,E", "A,B", "A,B,C", "D", "A", "B", "A", "A", "A,B,C", "D", "D"),
stock = c("A", "A,B", "A,B,C", "A,B,C,E", "A,B,C,E", "A,B,C,E", "A,B,C,D,E", "A", "A,B", "A,B", "A", "A,B,C", "A,B,C,D", "A,B,C,D"))
> df2
ID seqs count product stock
1 1 1 2 A A
2 1 2 1 B A,B
3 1 3 3 C A,B,C
4 1 4 1 A,C,E A,B,C,E
5 1 5 1 A,B A,B,C,E
6 1 6 2 A,B,C A,B,C,E
7 1 7 3 D A,B,C,D,E
8 2 1 1 A A
9 2 2 2 B A,B
10 2 3 1 A A,B
11 3 1 3 A A
12 3 2 1 A,B,C A,B,C
13 3 3 4 D A,B,C,D
14 3 4 1 D A,B,C,D
所需的输出如下:
ID seq1 seq2 seq3 measure1 measure2
1: 1 2 3 4 C E
2: 2 1 2 3
3: 3 2 3 4 D
您将如何编码?
推荐答案
要做到这一点,您需要了解的几件事:
Few things you need to know to be able to do this:
-
shift
函数比较组中的值 -
separate_rows
函数可拆分字符串以进入标准化数据视图。
shift
function to compare values in your groupsseparate_rows
function to split your strings to get to the normalised data view.
library(data.table)
dt <- data.table(ID = c(1,1,1,1,1,1,1,2,2,2,3,3,3,3),
seqs = c(1,2,3,4,5,6,7,1,2,3,1,2,3,4),
count = c(2,1,3,1,1,2,3,1,2,1,3,1,4,1),
product = c("A", "B", "C", "A,C,E", "A,B", "A,B,C", "D", "A", "B", "A", "A", "A,B,C", "D", "D"),
stock = c("A", "A,B", "A,B,C", "A,B,C,E", "A,B,C,E", "A,B,C,E", "A,B,C,D,E", "A", "A,B", "A,B", "A", "A,B,C", "A,B,C,D", "A,B,C,D"))
dt[, count.2 := shift(count, type = "lead")]
dt[, count.3 := shift(count, n = 2, type = "lead")]
dt[, product.2 := shift(product, type = "lead")]
dt[, product.3 := shift(product, n = 2, type = "lead")]
dt <- dt[count > 0 & count.2 > 1 & count.3 == 1]
dt <- unique(dt, by = "ID")
library(tidyr)
dt.measure <- separate_rows(dt, product.3, sep = ",")
dt.measure <- separate_rows(dt.measure, stock, sep = ",")
dt.measure <- separate_rows(dt.measure, product, sep = ",")
dt.measure[, measure.1 := (product.3 == product.2 & product.3 != stock)]
dt.measure[, measure.2 := (product.3 != product.2 & product.3 != stock)]
res <- dt.measure[,
.(
measure.1 = max(ifelse(measure.1, product.3, NA_character_), na.rm = TRUE),
measure.2 = max(ifelse(measure.2, product.3, NA_character_), na.rm = TRUE)
),
ID
]
dt <- merge(dt, res, by = "ID")
dt[, .(ID, measure.1, measure.2)]
# ID measure.1 measure.2
# 1: 1 C E
# 2: 2 <NA> <NA>
# 3: 3 D <NA>
这篇关于R-从排序数据中构建新变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文