将类型为“列表"的列转换为数据框中的多个列 [英] Converting a column of type 'list' to multiple columns in a data frame

查看:145
本文介绍了将类型为“列表"的列转换为数据框中的多个列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,其中的一列是一个列表,就像这样:

I have a data frame with one column which is a list, like so:

>head(movies$genre_list)
[[1]]
[1] "drama"   "action"  "romance"
[[2]]
[1] "crime" "drama"
[[3]]
[1] "crime"   "drama"   "mystery"
[[4]]
[1] "thriller" "indie"  
[[5]]
[1] "thriller"
[[6]]
[1] "drama"  "family"

我想将这一列转换为多列,对于列表中的每个唯一元素(在本例中为流派),将其转换为二元列.我正在寻找一种优雅的解决方案,该解决方案不涉及首先找出存在多少种类型,然后为每种类型创建一个列,然后检查每个列表元素以填充类型列.我尝试了取消列表,但是它不能以我想要的方式与列表向量一起使用.

I want to convert this one column to multiple columns, one for each unique element across the lists (in this case, genres), and have them as binary columns. I'm looking for an elegant solution, which doesn't involve first finding out how many genres are there, and then creating a column for each, and then checking each list element to then populate the genre columns. I tried unlist, but it doesn't work with a vector of lists in the way I want.

谢谢!

推荐答案

以下是几种方法:

movies <- data.frame(genre_list = I(list(
   c("drama",   "action",  "romance"),
   c("crime", "drama"),
   c("crime",   "drama",   "mystery"),
   c("thriller", "indie"),  
   c("thriller"),
   c("drama",  "family"))))


更新,几年后....

您可以使用"qdapTools"中的mtabulate函数,也可以使用我的"splitstackshape"包中未导出的charMat函数.


Update, years later....

You can use the mtabulate function from "qdapTools" or the unexported charMat function from my "splitstackshape" package.

语法为:

library(qdapTools)
mtabulate(movies$genre_list)
#   action crime drama family indie mystery romance thriller
# 1      1     0     1      0     0       0       1        0
# 2      0     1     1      0     0       0       0        0
# 3      0     1     1      0     0       1       0        0
# 4      0     0     0      0     1       0       0        1
# 5      0     0     0      0     0       0       0        1
# 6      0     0     1      1     0       0       0        0

splitstackshape:::charMat(movies$genre_list, fill = 0)
#      action crime drama family indie mystery romance thriller
# [1,]      1     0     1      0     0       0       1        0
# [2,]      0     1     1      0     0       0       0        0
# [3,]      0     1     1      0     0       1       0        0
# [4,]      0     0     0      0     1       0       0        1
# [5,]      0     0     0      0     0       0       0        1
# [6,]      0     0     1      1     0       0       0        0

更新:两种更直接的方法

改进的选项1 :直接使用table:

table(rep(1:nrow(movies), sapply(movies$genre_list, length)), 
      unlist(movies$genre_list, use.names=FALSE))

改进的选项2 :使用for循环.

x <- unique(unlist(movies$genre_list, use.names=FALSE))
m <- matrix(0, ncol = length(x), nrow = nrow(movies), dimnames = list(NULL, x))
for (i in 1:nrow(m)) {
  m[i, movies$genre_list[[i]]] <- 1
}
m


下面是旧答案


Below is the OLD answer

将列表转换为table s的列表(依次转换为data.frame s):

Convert the list to a list of tables (in turn converted to data.frames):

tables <- lapply(seq_along(movies$genre_list), function(x) {
  temp <- as.data.frame.table(table(movies$genre_list[[x]]))
  names(temp) <- c("Genre", paste("Record", x, sep = "_"))
  temp
})

使用Reducemerge结果列表.如果我正确地理解了您的最终目标,就会产生您感兴趣的结果的转置形式.

Use Reduce to merge the resulting list. If I understand your end goal correctly, this results in the transposed form of the result you are interested in.

merged_tables <- Reduce(function(x, y) merge(x, y, all = TRUE), tables)
merged_tables
#      Genre Record_1 Record_2 Record_3 Record_4 Record_5 Record_6
# 1   action        1       NA       NA       NA       NA       NA
# 2    drama        1        1        1       NA       NA        1
# 3  romance        1       NA       NA       NA       NA       NA
# 4    crime       NA        1        1       NA       NA       NA
# 5  mystery       NA       NA        1       NA       NA       NA
# 6    indie       NA       NA       NA        1       NA       NA
# 7 thriller       NA       NA       NA        1        1       NA
# 8   family       NA       NA       NA       NA       NA        1

NA转换并转换为0非常简单.只需删除第一列,然后将其用作新的data.frame

Transposing and converting NA to 0 is pretty straightforward. Just drop the first column and re-use it as the column names for the new data.frame

movie_genres <- setNames(data.frame(t(merged_tables[-1])), merged_tables[[1]])
movie_genres[is.na(movie_genres)] <- 0
movie_genres

这篇关于将类型为“列表"的列转换为数据框中的多个列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆