创建具有非空列名称的新列 [英] Create a new column with non-null columns' names

查看:124
本文介绍了创建具有非空列名称的新列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的资料集看起来像这样:

My data set looks like this one:

library(data.table)

df <- data.table(a = c(1,2,3,4,5),
                 b = c(1,0,2,5,1),
                 c = c(0,1,1,0,0),
                 d = c(1,0,0,2,2))

df
#    a b c d
# 1: 1 1 0 1
# 2: 2 0 1 0
# 3: 3 2 1 0
# 4: 4 5 0 2
# 5: 5 1 0 2

我想创建一个包含非空列名称的新列。结果将是:

I want to create a new column with non-null columns names. The result will be:

df_result <- data.table(a = c(1,2,3,4,5),
                        z = c('b_d', 'c', 'b_c', 'b_d', 'b_d'))

df_result
#    a   z
# 1: 1 b_d
# 2: 2   c
# 3: 3 b_c
# 4: 4 b_d
# 5: 5 b_d


推荐答案

一个选项是使用<$ c $将格式从'wide'转换为'long' c> melt 。通过'a'分组,我们粘贴与值(在'i'中作为逻辑条件提供)中的非零元素相对应的变量元素。

One option would be to convert the format from 'wide' to 'long' using melt. Grouped by 'a', we paste the 'variable' elements that corresponds to non-zero elements in 'value' (provided as logical condition in 'i').

melt(df, id.var='a')[value!=0, 
      .(z=paste(variable, collapse="_")), keyby =a]
#   a   z
#1: 1 b_d
#2: 2   c
#3: 3 b_c
#4: 4 b_d
#5: 5 b_d






或者代替 melt ing,我们可以按'a', unlist of Data.table( .SD )和粘贴 names 对应于非零元素('i1')的列。


Or instead of melting, we can group by 'a', unlist the Subset of Data.table (.SD) and paste the names of the columns that corresponds to non-zero elements ('i1').

df[, {i1 <- !!unlist(.SD)
       paste(names(.SD)[i1], collapse="_")} , by= a]



基准



Benchmarks

set.seed(24)
df1 <- data.table(a=1:1e6, b = sample(0:5, 1e6, 
   replace=TRUE), c = sample(0:4, 1e6, replace=TRUE), 
    d = sample(0:3, 1e6, replace=TRUE))

akrun1 <- function() {
   melt(df1, id.var='a')[value!=0, 
      .(z=paste(variable, collapse="_")), keyby =a]
    }

 akrun2 <- function() {
   df1[, {i1 <- !!unlist(.SD)
       paste(names(.SD)[i1], collapse="_")} , by= a]
   }

 ronak <- function() {
    data.table(z = lapply(apply(df1, 1, function(x)
                which(x[-1]!= 0)), 
       function(x) paste0(names(x), collapse = "_")))
   }

eddi <- function(){
 df1[, newcol := gsub("NA_|_NA|NA", "",                          
   do.call(function(...) paste(..., sep = "_"),            
     Map(function(x, y) x[(y == 0) + 1], names(.SD), .SD)))
 , .SDcols = b:d]

 }

alexis = function(x)
   {
   ans = character(nrow(x))
   for(j in seq_along(x)) {
    i = x[[j]] > 0L
    ans[i] = paste(ans[i], names(x)[[j]], sep = "_")
   }
  return(gsub("^_", "", ans))
}





system.time(akrun1())
#   user  system elapsed 
#  22.04    0.15   22.36 
 system.time(akrun2())
#   user  system elapsed 
# 26.33    0.00   26.41 
 system.time(ronak())
#   user  system elapsed 
#  25.60    0.26   25.96 


system.time(alexis(df1[, -1L, with = FALSE]))
#   user  system elapsed 
#   1.92    0.06    2.09 

system.time(eddi())
#  user  system elapsed 
#   2.41    0.06    3.19 

这篇关于创建具有非空列名称的新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆