使用data.table列中的列表 [英] Using lists inside data.table columns

查看:106
本文介绍了使用data.table列中的列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

data.table 中可能有的列,我第一次尝试以受益于此功能。我需要存储我的表的每一行 dt 几个注释从rApache Web服务。

In data.table is possible to have columns of type list and I'm trying for the first time to benefit from this feature. I need to store for each row of my table dt several comments taken from an rApache web service. Each comment will have a username, datetime, and body item.

每个注释都有一个用户名,日期时间和正文项。 c $ c> | )和一个; 来分隔注释中的每个项目,我想使用这样的列表:

Instead of using long strings with some weird, unusual character to separate each message from the others (like |), and a ; to separate each item in a comment, I thought to use lists like this:

library(data.table)
dt <- data.table(id=1:2, comment=list(list(list(username="michele", date=Sys.time(), message="hello"),
                                           list(username="michele", date=Sys.time(), message="world")),
                                      list(list(username="michele", date=Sys.time(), message="hello"),
                                           list(username="michele", date=Sys.time(), message="world"))))

> dt
   id comment
1:  1  <list>
2:  2  <list>

可存储为特定行添加的所有注释。 (也因为稍后当我需要将其发送回UI时,将更容易转换为 JSON

to store all the comments added for one particular row. (also because it will be easier to convert to JSON later on when I need to send it back to the UI)

然而,当我尝试模拟在生产过程中如何实际填充我的表(向特定行添加单个注释), R 崩溃或不分配我想要然后崩溃:

However, when I try to simulate how I will be actually filling my table during production (adding single comment to a particular row), R either crashes or doesn't assign what I would like and then crashes:

library(data.table)

> library(data.table)
> dt <- data.table(id=1:2, comment=vector(mode="list", length=2))
> dt$comment
[[1]]
NULL

[[2]]
NULL

> dt[1L, comment := 1] # this works
> dt$comment
[[1]]
[1] 1

[[2]]
NULL

> set(dt, 1L, "comment", list(1, "a"))  # assign only `1` and when I try to see `dt` R crashes
Warning message:
In set(dt, 1L, "comment", list(1, "a")) :
  Supplied 2 items to be assigned to 1 items of column 'comment' (1 unused)

> dt[1L, comment := list(1, "a")]       # R crashes as soon as I run
> dt[1L, comment := list(list(1, "a"))] # any of these two

我知道我试图滥用 data.table ,例如设计 j 的方式允许这样:

I know I'm trying to misuse data.table, e.g. the way the j argument has been designed allows this:

dt[1L, c("id", "comment") := list(1, "a")] # lists in RHS are seen as different columns! not parts of one

问题:我想要的作业?或者我只需要在变量中取 dt $ comment ,然后修改它,然后每次我需要更新时重新分配整个列?

Question: So, is there a way to do the assignment I want? Or I just have to take dt$comment out in a variable, modify it, and then re-assign the whole column every times I need to do an update?

推荐答案

使用:=

dt = data.table(id = 1:2, comment = vector("list", 2L))

# assign value 1 to just the first column of 'comment'
dt[1L, comment := 1L]

# assign value of 1 and "a" to rows 1 and 2
dt[, comment := list(1, "a")]

# assign value of "a","b" to row 1, and 1 to row 2 for 'comment'
dt[, comment := list(c("a", "b"), 1)]

# assign list(1, "a") to just 1 row of 'comment'
dt[1L, comment := list(list(list(1, "a")))]

对于最后一种情况,您将需要一个列表,因为 data.table 使用列表(。)以查找要通过引用分配给列的值。

For the last case, you'll need one more list because data.table uses list(.) to look for values to assign to columns by reference.

使用设置

dt = data.table(id = 1:2, comment = vector("list", 2L))

# assign value 1 to just the first column of 'comment'
set(dt, i=1L, j="comment", value=1L)

# assign value of 1 and "a" to rows 1 and 2
set(dt, j="comment", value=list(1, "a"))

# assign value of "a","b" to row 1, and 1 to row 2 for 'comment'
set(dt, j="comment", value=list(c("a", "b"), 1))

# assign list(1, "a") to just 1 row of 'comment'
set(dt, i=1L, j="comment", value=list(list(list(1, "a"))))

HTH

使用当前的开发版本1.9.3,但应该在任何其他版本上工作正常。

I'm using the current development version 1.9.3, but should just work fine on any other version.

> sessionInfo()
R version 3.0.3 (2014-03-06)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.9.3

loaded via a namespace (and not attached):
[1] plyr_1.8.0.99  reshape2_1.2.2 stringr_0.6.2  tools_3.0.3   

这篇关于使用data.table列中的列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆