将列表的数据帧转换成数据帧 [英] r - data frame of lists into a data frame

查看:205
本文介绍了将列表的数据帧转换成数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

背景



我正在查询mongodb数据库以查找文档:

  library(rmongodb)
...
res< - mongo.find.one(m,n,q,f)#<~~返回BSON
res< - mongo.bson.to.list(res)#< ~~将BSON转换为列表

然后我使用此答案尝试将其转换为数据框架



  df<  -  as.data.frame(t(sapply(res [[1]],'[',seq (sl [([[1]],长度))))))

我有一个列表的数据框(为方便起见,这里列出):



数据

 > dput(df)
结构(list(horse_id = list(17643L,4997L,20047L,9914L,
17086L,12462L,18490L,17642L,26545L,27603L,14635L,13811L,
27719L, 31585L,9644L),start_1400m = list(14.76,14.3,14.48,
15.11,14.65,14.63,14.55,14.54,14.93,14.5,14.78,NULL,
NULL,NULL,NULL),`1400m_1200m `= list(12.96,12.47,12.47,
13.02,12.65,12.28,13.11,12.37,13,12.84,12.79,NULL,
NULL,NULL,NULL)),.Names = c( horse_id,start_1400m,
1400m_1200m),row.names = c(NA,15L),class =data.frame)

>头(df)
horse_id start_1400m 1400m_1200m
1 17643 14.76 12.96
2 4997 14.3 12.47
3 20047 14.48 12.47
4 9914 15.11 13.02
5 17086 14.65 12.65
6 12462 14.63 12.92

问题



我想要 library(reshape2);融化,然后使用 ggplot2 绘制这些数据,但如预期的那样,我不能熔化具有非原子的数据列

 > (df,id.vars = c(horse_id))
错误:无法使用非原子度量列来融化data.frames

如何将此数据转换为标准数据框(即不是列表的数据框?)或融合c $ c>它是吗?



更新



我没有正确地考虑 NULL 在数据中。使用此问题中的评论 - 将NA替换为NA 这个答案 - 将列表转换为带空值的DF 我想出了

  d<  -  as.data.frame(do.call(rbind,df))

库(plyr)
d < - rbind.fill(lapply(d,function f){
data.frame(Filter(Negate(is.null),f))
}))

名称(d)< - sub(X ,,名称(d))#< ~~清理名称
d< - melt(d,id.vars = c(horse_id))#< ~~ fusion用于ggplot2

NULL替换为 NA s,并允许我融化数据。但是,我仍然没有完全按照每一步都在做,或者是否是正确的方法。

解决方案

从向量或列表中创建的data.frames可以将这些对象表示为dput()输出中的列表是正常现象,这通常不是问题,因为它仍然可以用作data.frame。 / p>

例如:

 > a = list(1,2,3)
> b =列表(4,5,6)
> df = data.frame(a)
> df = rbind(b,df)
> df
X1 X2 X3
1 4 5 6
2 1 2 3
> s = sum(df [,2])
> s
[1] 7
> str(df)
'data.frame':2 obs。的3个变量:
$ X1:num 4 1
$ X2:num 5 2
$ X3:num 6 3
> dput(df)
结构(列表(X1 = c(4,1),X2 = c(5,2),X3 = c(6,3))),.Names = c(X1 b $ bX2,X3),row.names = 1:2,class =data.frame)
>


Background

I'm querying a mongodb database to find a document:

library(rmongodb)
...
res <- mongo.find.one(m, n, q, f)  # <~~ returns BSON 
res <- mongo.bson.to.list(res)     # <~~ converts BSON to list

I'm then using this answer to try and convert it to a data frame

df <- as.data.frame(t(sapply(res[[1]], '[', seq(max(sapply(res[[1]],length))))))

However, this gives me a data frame of lists (subsetted here for convenience):

data

> dput(df)
structure(list(horse_id = list(17643L, 4997L, 20047L, 9914L, 
17086L, 12462L, 18490L, 17642L, 26545L, 27603L, 14635L, 13811L, 
27719L, 31585L, 9644L), start_1400m = list(14.76, 14.3, 14.48, 
15.11, 14.65, 14.63, 14.85, 14.54, 14.93, 14.5, 14.78, NULL, 
NULL, NULL, NULL), `1400m_1200m` = list(12.96, 12.47, 12.47, 
13.02, 12.65, 12.92, 13.11, 12.37, 13, 12.84, 12.79, NULL, 
NULL, NULL, NULL)), .Names = c("horse_id", "start_1400m", 
"1400m_1200m"), row.names = c(NA, 15L), class = "data.frame")

> head(df)
    horse_id start_1400m 1400m_1200m
1    17643       14.76       12.96
2     4997        14.3       12.47
3    20047       14.48       12.47
4     9914       15.11       13.02
5    17086       14.65       12.65
6    12462       14.63       12.92

Issue

I would like to library(reshape2); melt and then plot this data using ggplot2, but as expected I can't melt data.frames with non-atomic columns.

> melt(df, id.vars=c("horse_id"))
Error: Can't melt data.frames with non-atomic 'measure' columns

How can I convert this data to a 'standard' data frame (i.e. not a data frame of lists?), or melt it as is?

Update

I hadn't properly considered the NULLs in the data. Using a combination of a comment in this question - replacing NULL with NA and this answer - Convert List to DF with NULLs I came up with

d <- as.data.frame(do.call("rbind", df))

library(plyr)
d <- rbind.fill(lapply(d, function(f) {
  data.frame(Filter(Negate(is.null), f))
}))

names(d) <- sub("X","",names(d))      #<~~ clean the names
d <- melt(d, id.vars=c("horse_id"))   #<~~ melt for use in ggplot2

This replaces the NULLs with NAs and allows me to melt the data. However, I'm still not fully au fait with what each step is doing yet, or whether this is the right approach.

解决方案

It is normal for data.frames created from vectors or lists to have those objects represented as lists in dput() output and that is not usually a problem because it still works as a data.frame.

For example:

> a = list(1, 2, 3)
> b = list(4, 5, 6)
> df = data.frame(a)
> df = rbind(b, df)
> df
   X1 X2 X3
1   4  5  6
2   1  2  3
> s = sum(df[,2])
> s
[1] 7
> str(df)
'data.frame':   2 obs. of  3 variables:
 $ X1: num  4 1
 $ X2: num  5 2
 $ X3: num  6 3
> dput(df)
structure(list(X1 = c(4, 1), X2 = c(5, 2), X3 = c(6, 3)), .Names = c("X1", 
"X2", "X3"), row.names = 1:2, class = "data.frame")
> 

这篇关于将列表的数据帧转换成数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆