数据表中的重塑拼图 [英] A reshape puzzle in data.table
问题描述
data.table
中的另一个重塑问题。种子(1234)
DT< - data.table(x = rep(c(1,2,3),each = 4),y = c(A,B),v = sample (1:100,12))
#xyv
#1:1 A 12
#2:1 B 62
...
#11:3 A 63
#12:3 B 49
我想累加一个 x
和 v
由 y
:
行数总是保持不变,当 y == A
SUM。*。A
递增,当 y == B
时相同。 (如通常 y
可能有很多因素,在本例中为2)
#SUM.xA SUM.xB SUM.vA SUM.vB
#1:1 NA 12 NA
#2:1 1 12 62
...
#11: 12 9 318 289
#12:12 12 318 338
解决方案显然过于复杂
#第一步是创建cumsum列
colNames< - c(x v); newColNames < - paste0(SUM。,colNames)
DT [,newColNames:= lapply(.SD,cumsum),by = y,.SDcols = colNames,with = F];
#我们需要重塑每个SUM。*到SUM。*。{yvalue}
DT [,N:=。I]; setattr(DT,sorted,N)
g< - function(DT,SD){
cols< - c('N',grep('SUM' colnames(SD),value = T));
Yval< - unique(SD [,y]);
merge(DT,SD [,cols,with = F],suffixe = c('',paste0('。',Yval)),all.x = T);
}
DT < - 减少(f = g,init = DT,x = split(DT,DT $ y));
locf = function(x){
ind = which(!is.na(x))
if(is.na(x [1]))ind = c (1,ind)
rep(x [ind],times = diff(c(ind,length(x)+1))
newColNames& grep('SUM',colnames(DT),value = T);
DT < - DT [,(newColNames):= lapply(.SD,locf),.SDcols = newColNames]
尝试:
cumsum0 < (x){x- cumsum(x); ifelse(x == 0,NA,x)}
DT2 <-DT [,{SUM。 lapply(data.table(model.matrix(〜SUM.:x + SUM.:v + 0)),cumsum0)}]
setnames(DT2,sub((。):(。) \\2.\\1,名称(DT2)))
简化:
1)如果使用 0
代替 NA
是确定的,那么可以通过省略定义 cumsum0
的第一行并将下一行替换为 cumsum0
code> cumsum 。
2)第二行的结果具有以下名称:
>名称(DT2)
[1]SUM.A:xSUM.B:xSUM.A:vSUM.B:v
所以如果这足够了,最后一行可以被删除,因为它的唯一目的是使名称与问题中的完全一样。
结果(没有简化)是:
DT2
pre>
SUM.xA SUM.xB SUM.vA SUM.vB
1:1 NA 12 NA
2:1 1 12 62
3:2 1 72 62
4:2 2 72 123
5:4 2 155 123
6:4 4 155 220
7:6 4 156 220
8:6 6 156 242
9:9 6 255 242
10:9 9 255 289
11:12 9 318 289
12:12 12 318 338
Yet another reshape problem in
data.table
set.seed(1234) DT <- data.table(x=rep(c(1,2,3),each=4), y=c("A","B"), v=sample(1:100,12)) # x y v # 1: 1 A 12 # 2: 1 B 62 ... #11: 3 A 63 #12: 3 B 49
I would like to do a cumulative sum of
x
andv
byy
but the result to be presented as: The number of lines always stays the same, and wheny==A
theSUM.*.A
is incremented, same wheny==B
. (As usualy
could have many factors, 2 in this example)# SUM.x.A SUM.x.B SUM.v.A SUM.v.B # 1: 1 NA 12 NA # 2: 1 1 12 62 ... #11: 12 9 318 289 #12: 12 12 318 338
EDIT: Here is my poor solution clearly overly complicated
#first step is to create cumsum columns colNames <- c("x","v"); newColNames <- paste0("SUM.",colNames) DT[, newColNames:=lapply(.SD,cumsum) ,by=y, .SDcols=colNames, with=F]; #now we need to reshape each SUM.* to SUM.*.{yvalue} DT[,N:=.I]; setattr(DT,"sorted","N") g <- function(DT,SD){ cols <- c('N',grep('SUM',colnames(SD), value=T)); Yval <- unique(SD[,y]); merge(DT, SD[,cols, with=F], suffixe=c('',paste0('.',Yval)), all.x=T); } DT <- Reduce(f=g,init=DT,x=split(DT,DT$y)); locf = function(x) { ind = which(!is.na(x)) if(is.na(x[1])) ind = c(1,ind) rep(x[ind], times = diff( c(ind, length(x) + 1) )) } newColNames <- grep('SUM',colnames(DT),value=T); DT <- DT[, (newColNames):=lapply(.SD, locf), .SDcols=newColNames]
解决方案Try this:
cumsum0 <- function(x) { x <- cumsum(x); ifelse(x == 0, NA, x) } DT2 <- DT[, {SUM.<-y; lapply(data.table(model.matrix(~ SUM.:x + SUM.:v + 0)), cumsum0)}] setnames(DT2, sub("(.):(.)", "\\2.\\1", names(DT2)))
Simplifications:
1) If using
0
in place ofNA
is ok then it can be simplified by omitting the first line which definescumsum0
and replacingcumsum0
in the next line withcumsum
.2) The result of the second line has these names:
> names(DT2) [1] "SUM.A:x" "SUM.B:x" "SUM.A:v" "SUM.B:v"
so if that is sufficient the last line can be dropped since its only purpose is to make the names exactly the same as in the question.
The result (without the simplifications) is:
> DT2 SUM.x.A SUM.x.B SUM.v.A SUM.v.B 1: 1 NA 12 NA 2: 1 1 12 62 3: 2 1 72 62 4: 2 2 72 123 5: 4 2 155 123 6: 4 4 155 220 7: 6 4 156 220 8: 6 6 156 242 9: 9 6 255 242 10: 9 9 255 289 11: 12 9 318 289 12: 12 12 318 338
这篇关于数据表中的重塑拼图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!