将列表的数据帧转换成数据帧 [英] r - data frame of lists into a data frame
问题描述
背景
我正在查询mongodb数据库以查找文档:
library(rmongodb)
...
res< - mongo.find.one(m,n,q,f)#<~~返回BSON
res< - mongo.bson.to.list(res)#< ~~将BSON转换为列表
然后我使用此答案尝试将其转换为数据框架
df< - as.data.frame(t(sapply(res [[1]],'[',seq (sl [([[1]],长度))))))
我有一个列表的数据框(为方便起见,这里列出):
数据
> dput(df)
结构(list(horse_id = list(17643L,4997L,20047L,9914L,
17086L,12462L,18490L,17642L,26545L,27603L,14635L,13811L,
27719L, 31585L,9644L),start_1400m = list(14.76,14.3,14.48,
15.11,14.65,14.63,14.55,14.54,14.93,14.5,14.78,NULL,
NULL,NULL,NULL),`1400m_1200m `= list(12.96,12.47,12.47,
13.02,12.65,12.28,13.11,12.37,13,12.84,12.79,NULL,
NULL,NULL,NULL)),.Names = c( horse_id,start_1400m,
1400m_1200m),row.names = c(NA,15L),class =data.frame)
>头(df)
horse_id start_1400m 1400m_1200m
1 17643 14.76 12.96
2 4997 14.3 12.47
3 20047 14.48 12.47
4 9914 15.11 13.02
5 17086 14.65 12.65
6 12462 14.63 12.92
问题
我想要 library(reshape2);融化
,然后使用 ggplot2
绘制这些数据,但如预期的那样,我不能熔化具有非原子的数据列
。
> (df,id.vars = c(horse_id))
错误:无法使用非原子度量列来融化data.frames
如何将此数据转换为标准数据框(即不是列表的数据框?)或融合c $ c>它是吗?
更新
我没有正确地考虑 NULL
在数据中。使用此问题中的评论 - 将NA替换为NA 和这个答案 - 将列表转换为带空值的DF 我想出了
d< - as.data.frame(do.call(rbind,df))
库(plyr)
d < - rbind.fill(lapply(d,function f){
data.frame(Filter(Negate(is.null),f))
}))
名称(d)< - sub(X ,,名称(d))#< ~~清理名称
d< - melt(d,id.vars = c(horse_id))#< ~~ fusion用于ggplot2
将 NULL替换为
NA
s,并允许我融化数据。但是,我仍然没有完全按照每一步都在做,或者是否是正确的方法。
从向量或列表中创建的data.frames可以将这些对象表示为dput()输出中的列表是正常现象,这通常不是问题,因为它仍然可以用作data.frame。 / p>
例如:
> a = list(1,2,3)
> b =列表(4,5,6)
> df = data.frame(a)
> df = rbind(b,df)
> df
X1 X2 X3
1 4 5 6
2 1 2 3
> s = sum(df [,2])
> s
[1] 7
> str(df)
'data.frame':2 obs。的3个变量:
$ X1:num 4 1
$ X2:num 5 2
$ X3:num 6 3
> dput(df)
结构(列表(X1 = c(4,1),X2 = c(5,2),X3 = c(6,3))),.Names = c(X1 b $ bX2,X3),row.names = 1:2,class =data.frame)
>
Background
I'm querying a mongodb database to find a document:
library(rmongodb)
...
res <- mongo.find.one(m, n, q, f) # <~~ returns BSON
res <- mongo.bson.to.list(res) # <~~ converts BSON to list
I'm then using this answer to try and convert it to a data frame
df <- as.data.frame(t(sapply(res[[1]], '[', seq(max(sapply(res[[1]],length))))))
However, this gives me a data frame of lists (subsetted here for convenience):
data
> dput(df)
structure(list(horse_id = list(17643L, 4997L, 20047L, 9914L,
17086L, 12462L, 18490L, 17642L, 26545L, 27603L, 14635L, 13811L,
27719L, 31585L, 9644L), start_1400m = list(14.76, 14.3, 14.48,
15.11, 14.65, 14.63, 14.85, 14.54, 14.93, 14.5, 14.78, NULL,
NULL, NULL, NULL), `1400m_1200m` = list(12.96, 12.47, 12.47,
13.02, 12.65, 12.92, 13.11, 12.37, 13, 12.84, 12.79, NULL,
NULL, NULL, NULL)), .Names = c("horse_id", "start_1400m",
"1400m_1200m"), row.names = c(NA, 15L), class = "data.frame")
> head(df)
horse_id start_1400m 1400m_1200m
1 17643 14.76 12.96
2 4997 14.3 12.47
3 20047 14.48 12.47
4 9914 15.11 13.02
5 17086 14.65 12.65
6 12462 14.63 12.92
Issue
I would like to library(reshape2); melt
and then plot this data using ggplot2
, but as expected I can't melt data.frames with non-atomic columns
.
> melt(df, id.vars=c("horse_id"))
Error: Can't melt data.frames with non-atomic 'measure' columns
How can I convert this data to a 'standard' data frame (i.e. not a data frame of lists?), or melt
it as is?
Update
I hadn't properly considered the NULL
s in the data. Using a combination of a comment in this question - replacing NULL with NA and this answer - Convert List to DF with NULLs I came up with
d <- as.data.frame(do.call("rbind", df))
library(plyr)
d <- rbind.fill(lapply(d, function(f) {
data.frame(Filter(Negate(is.null), f))
}))
names(d) <- sub("X","",names(d)) #<~~ clean the names
d <- melt(d, id.vars=c("horse_id")) #<~~ melt for use in ggplot2
This replaces the NULL
s with NA
s and allows me to melt
the data. However, I'm still not fully au fait with what each step is doing yet, or whether this is the right approach.
It is normal for data.frames created from vectors or lists to have those objects represented as lists in dput() output and that is not usually a problem because it still works as a data.frame.
For example:
> a = list(1, 2, 3)
> b = list(4, 5, 6)
> df = data.frame(a)
> df = rbind(b, df)
> df
X1 X2 X3
1 4 5 6
2 1 2 3
> s = sum(df[,2])
> s
[1] 7
> str(df)
'data.frame': 2 obs. of 3 variables:
$ X1: num 4 1
$ X2: num 5 2
$ X3: num 6 3
> dput(df)
structure(list(X1 = c(4, 1), X2 = c(5, 2), X3 = c(6, 3)), .Names = c("X1",
"X2", "X3"), row.names = 1:2, class = "data.frame")
>
这篇关于将列表的数据帧转换成数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!