将唯一值拆分为多个列的单独列 [英] Split unique values into separate columns for multiple columns

查看：49 发布时间：2021/4/28 19:37:58 r machine-learning data.table

本文介绍了将唯一值拆分为多个列的单独列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我数据的每个列将重新缩放并放入0到100的bin中.bin列将用作模型的特征.为了分别测试每个bin，我想将每个bin列分为每个值的单独列.新列将保持为0或1，具体取决于单元格中的值是否与列的bin匹配.从这样的东西:

Each of my data's columns will be rescaled and put into bins from 0 to 100. The bin columns will be used as features for a model. In order to test each bin separately, I'd like to split each bin column into separate columns for each of it's values. The new column will hold either a 0 or 1, dependent upon whether the value in the cell matched the column's bin. From something like this:

对此:

row values_10 values_20 values_30 values_40
  1         1         0         0         0
  2         0         1         0         0
  3         0         0         1         0
  4         0         0         0         1
  5         1         0         0         0
  6         0         0         1         0
  7         0         0         0         1

这种蛮力方法可以完成任务，但是必须有一种更好的(非循环)方法:

This brute force approach does the job, but there must be a better (non-loop) way:

values <- c( 10,20,30,40,10,30,40)
dat <- data.frame(values)

columnNames <- unique(dat$values)

for( n in 1:length(columnNames) )
{
    dat[as.character(columnNames[n])]  <- 0
}

columnNames2 <- colnames(dat)

for( c in 2:ncol(dat))
{
    hdr <- columnNames2[c]

    for( r in 1:nrow(dat))
    {
        if( dat$values[r]==as.integer(hdr) )
            dat[r,c]=1
    }
}

非常感谢！

编辑

这些都是很好的答案，谢谢大家.最终对象(无论是矩阵，表还是data.table)将仅包含单独的bin列(不包含源列).下面的解决方案如何用于2000多个源列?

These are all great answers, thank you everyone. The final object, whether a matrix, table, or data.table, will contain only the separate bin columns (no source columns). How can the solutions below be used for 2000+ source columns?

EDIT2

基于对我的后续问题的回答，以下是将来遇到此问题的任何人所用每种方法的实现.

Based on the answers to my follow-up question, below are implementations for each of the methods for anyone coming to this question in the future.

# read in some data with multiple columns

df_in  <- read.table(text="row val1 val2
                  1     10     100
                  2     20     200
                  3     30     300
                  4     40     400
                  5     10     100
                  6     30     300
                  7     40     400", header=TRUE, stringsAsFactors=FALSE)

#   @Zelazny7 's method using a matrix

df_in$row <- NULL

col_names <- names(df_in)

for( c in 1:length(col_names)){

    uniq <- unlist(unique(df_in[col_names[c]]))

    m <- matrix(0, nrow(df_in), length(uniq), 
                dimnames = list(NULL, paste0(col_names[c], "_", uniq)))

    for (i in seq_along(df_in[[col_names[c]]])) {
        k <- match(df_in[[col_names[c]]][i], uniq, 0)
        m[i,k] <- 1
    }

    if( c==1 )
        df_out <- m
    else
        df_out <- cbind(df_out,m)
}


#   @P Lapointe 's method using 'table'

col_names <- names(df_in)

for( c in 2:length(col_names)){

    m <- table(df_in$row,df_in[[col_names[c]]])    
    uniq <- unlist(unique(df_in[col_names[c]]))
    newNames <- toString(paste0(col_names[c],'_',uniq))

    if( c==2 ){
        df_out <- m
        hdrs <- newNames
    }
    else{
        df_out <- cbind(df_out,m)
        hdrs <- paste(hdrs,newNames,sep=", ")
    }
}

colnames(df_out) <- unlist(strsplit(hdrs, split=", "))


#   @bdemarest 's method using 'data.table'
#   read in data first

library(data.table)

df_in = fread("row val1 val2
            1     10     100
            2     20     200
            3     30     300
            4     40     400
            5     10     100
            6     30     300
            7     40     400")

df_in$count = 1L

col_names <- names(df_in)

for( c in 2:length(col_names)-1){

    m = dcast(df_in, paste( 'row', '~', col_names[c]), value.var="count", fill=0L)

    uniq <- unlist(unique(df_in[,get(col_names[c])]))
    newNames <- toString(paste0(col_names[c],'_',uniq))

    m$row <- NULL

    if( c==2 ){
        df_out <- m
        hdrs <- newNames
    }
    else if( c>2 ){
        df_out <- cbind(df_out,m)
        hdrs <- paste(hdrs,newNames,sep=", ")
    }
}

colnames(df_out) <- unlist(strsplit(hdrs, split=", "))

所有答案都是适当且可用的，因此最好的答案将被授予最快的初始响应.再次感谢您的帮助！

All answers were appropriate and usable so the best answer was awarded to the quickest initial response. Thanks again for your help!!

将唯一值拆分为多个列的单独列 [英] Split unique values into separate columns for multiple columns

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

将唯一值拆分为多个列的单独列 [英] Split unique values into separate columns for multiple columns

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭