数据表中的重塑拼图 [英] A reshape puzzle in data.table

查看:98
本文介绍了数据表中的重塑拼图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

data.table

 中的另一个重塑问题。种子(1234)
DT< - data.table(x = rep(c(1,2,3),each = 4),y = c(A,B),v = sample (1:100,12))
#xyv
#1:1 A 12
#2:1 B 62
...
#11:3 A 63
#12:3 B 49

我想累加一个 x v y
行数总是保持不变,当 y == A SUM。*。A 递增,当 y == B 时相同。 (如通常 y 可能有很多因素,在本例中为2)

 #SUM.xA SUM.xB SUM.vA SUM.vB 
#1:1 NA 12 NA
#2:1 1 12 62
...
#11: 12 9 318 289
#12:12 12 318 338

解决方案显然过于复杂

 #第一步是创建cumsum列
colNames< - c(x v); newColNames < - paste0(SUM。,colNames)
DT [,newColNames:= lapply(.SD,cumsum),by = y,.SDcols = colNames,with = F];
#我们需要重塑每个SUM。*到SUM。*。{yvalue}
DT [,N:=。I]; setattr(DT,sorted,N)

g< - function(DT,SD){
cols< - c('N',grep('SUM' colnames(SD),value = T));
Yval< - unique(SD [,y]);
merge(DT,SD [,cols,with = F],suffixe = c('',paste0('。',Yval)),all.x = T);
}

DT < - 减少(f = g,init = DT,x = split(DT,DT $ y));

locf = function(x){
ind = which(!is.na(x))
if(is.na(x [1]))ind = c (1,ind)
rep(x [ind],times = diff(c(ind,length(x)+1))


newColNames& grep('SUM',colnames(DT),value = T);
DT < - DT [,(newColNames):= lapply(.SD,locf),.SDcols = newColNames]


解决方案

尝试:

  cumsum0 < (x){x- cumsum(x); ifelse(x == 0,NA,x)} 
DT2 <-DT [,{SUM。 lapply(data.table(model.matrix(〜SUM.:x + SUM.:v + 0)),cumsum0)}]
setnames(DT2,sub((。):(。) \\2.\\1,名称(DT2)))

简化:



1)如果使用 0 代替 NA 是确定的,那么可以通过省略定义 cumsum0 的第一行并将下一行替换为 cumsum0 code> cumsum 。



2)第二行的结果具有以下名称:

 >名称(DT2)
[1]SUM.A:xSUM.B:xSUM.A:vSUM.B:v

所以如果这足够了,最后一行可以被删除,因为它的唯一目的是使名称与问题中的完全一样。

结果(没有简化)是:

  DT2 
SUM.xA SUM.xB SUM.vA SUM.vB
1:1 NA 12 NA
2:1 1 12 62
3:2 1 72 62
4:2 2 72 123
5:4 2 155 123
6:4 4 155 220
7:6 4 156 220
8:6 6 156 242
9:9 6 255 242
10:9 9 255 289
11:12 9 318 289
12:12 12 318 338
pre>

Yet another reshape problem in data.table

set.seed(1234)
DT <- data.table(x=rep(c(1,2,3),each=4), y=c("A","B"), v=sample(1:100,12))
#    x y  v
# 1: 1 A 12
# 2: 1 B 62
...
#11: 3 A 63
#12: 3 B 49

I would like to do a cumulative sum of x and v by y but the result to be presented as: The number of lines always stays the same, and when y==A the SUM.*.A is incremented, same when y==B. (As usual y could have many factors, 2 in this example)

#     SUM.x.A SUM.x.B  SUM.v.A SUM.v.B
# 1:        1      NA       12      NA
# 2:        1       1       12      62
...
#11:       12       9       318     289
#12:       12      12       318     338

EDIT: Here is my poor solution clearly overly complicated

#first step is to create cumsum columns
colNames <- c("x","v"); newColNames <- paste0("SUM.",colNames)
DT[, newColNames:=lapply(.SD,cumsum) ,by=y, .SDcols=colNames, with=F];
#now we need to reshape each SUM.* to SUM.*.{yvalue}
DT[,N:=.I]; setattr(DT,"sorted","N")

g <- function(DT,SD){
  cols <- c('N',grep('SUM',colnames(SD), value=T));
  Yval <- unique(SD[,y]);
  merge(DT, SD[,cols, with=F], suffixe=c('',paste0('.',Yval)), all.x=T);    
}

DT <- Reduce(f=g,init=DT,x=split(DT,DT$y));

locf = function(x) {
  ind = which(!is.na(x))    
  if(is.na(x[1])) ind = c(1,ind)
  rep(x[ind], times = diff( c(ind, length(x) + 1) )) 
}

newColNames <- grep('SUM',colnames(DT),value=T);
DT <- DT[, (newColNames):=lapply(.SD, locf), .SDcols=newColNames]

解决方案

Try this:

cumsum0 <- function(x) { x <- cumsum(x); ifelse(x == 0, NA, x) }
DT2 <- DT[, {SUM.<-y; lapply(data.table(model.matrix(~ SUM.:x + SUM.:v + 0)), cumsum0)}]
setnames(DT2, sub("(.):(.)", "\\2.\\1", names(DT2)))

Simplifications:

1) If using 0 in place of NA is ok then it can be simplified by omitting the first line which defines cumsum0 and replacing cumsum0 in the next line with cumsum.

2) The result of the second line has these names:

> names(DT2)
[1] "SUM.A:x" "SUM.B:x" "SUM.A:v" "SUM.B:v"

so if that is sufficient the last line can be dropped since its only purpose is to make the names exactly the same as in the question.

The result (without the simplifications) is:

> DT2
    SUM.x.A SUM.x.B SUM.v.A SUM.v.B
 1:       1      NA      12      NA
 2:       1       1      12      62
 3:       2       1      72      62
 4:       2       2      72     123
 5:       4       2     155     123
 6:       4       4     155     220
 7:       6       4     156     220
 8:       6       6     156     242
 9:       9       6     255     242
10:       9       9     255     289
11:      12       9     318     289
12:      12      12     318     338

这篇关于数据表中的重塑拼图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆