在 data.table 列中使用列表 [英] Using lists inside data.table columns
问题描述
在 data.table
中可能有 list
类型的列,我第一次尝试从这个功能中受益.我需要为表 dt
的每一行存储从 rApache Web 服务获取的若干注释.每条评论都有一个用户名、日期时间和正文项.
In data.table
is possible to have columns of type list
and I'm trying for the first time to benefit from this feature. I need to store for each row of my table dt
several comments taken from an rApache web service. Each comment will have a username, datetime, and body item.
而不是使用带有一些奇怪的、不寻常的字符的长字符串来将每条消息与其他消息分开(如 |
),并使用 ;
来分隔评论中的每个项目,我想使用这样的列表:
Instead of using long strings with some weird, unusual character to separate each message from the others (like |
), and a ;
to separate each item in a comment, I thought to use lists like this:
library(data.table)
dt <- data.table(id=1:2,
comment=list(list(
list(username="michele", date=Sys.time(), message="hello"),
list(username="michele", date=Sys.time(), message="world")),
list(
list(username="michele", date=Sys.time(), message="hello"),
list(username="michele", date=Sys.time(), message="world"))))
> dt
id comment
1: 1 <list>
2: 2 <list>
存储为特定行添加的所有评论.(也因为稍后我需要将其发送回 UI 时转换为 JSON
会更容易)
to store all the comments added for one particular row. (also because it will be easier to convert to JSON
later on when I need to send it back to the UI)
但是,当我尝试模拟在生产过程中实际填充表格的方式(向特定行添加单个注释)时,R
要么崩溃,要么没有分配我想要的内容,并且然后崩溃:
However, when I try to simulate how I will be actually filling my table during production (adding single comment to a particular row), R
either crashes or doesn't assign what I would like and then crashes:
library(data.table)
> library(data.table)
> dt <- data.table(id=1:2, comment=vector(mode="list", length=2))
> dt$comment
[[1]]
NULL
[[2]]
NULL
> dt[1L, comment := 1] # this works
> dt$comment
[[1]]
[1] 1
[[2]]
NULL
> set(dt, 1L, "comment", list(1, "a")) # assign only `1` and when I try to see `dt` R crashes
Warning message:
In set(dt, 1L, "comment", list(1, "a")) :
Supplied 2 items to be assigned to 1 items of column 'comment' (1 unused)
> dt[1L, comment := list(1, "a")] # R crashes as soon as I run
> dt[1L, comment := list(list(1, "a"))] # any of these two
我知道我试图滥用 data.table
,例如j
参数的设计方式允许这样做:
I know I'm trying to misuse data.table
, e.g. the way the j
argument has been designed allows this:
dt[1L, c("id", "comment") := list(1, "a")] # lists in RHS are seen as different columns! not parts of one
问题:那么,有没有办法完成我想要的任务?或者我只需要在变量中取出 dt$comment
,修改它,然后每次需要更新时重新分配整个列?
Question: So, is there a way to do the assignment I want? Or I just have to take dt$comment
out in a variable, modify it, and then re-assign the whole column every times I need to do an update?
推荐答案
使用 :=
:
dt = data.table(id = 1:2, comment = vector("list", 2L))
# assign value 1 to just the first column of 'comment'
dt[1L, comment := 1L]
# assign value of 1 and "a" to rows 1 and 2
dt[, comment := list(1, "a")]
# assign value of "a","b" to row 1, and 1 to row 2 for 'comment'
dt[, comment := list(c("a", "b"), 1)]
# assign list(1, "a") to just 1 row of 'comment'
dt[1L, comment := list(list(list(1, "a")))]
对于最后一种情况,您还需要一个 list
,因为 data.table
使用 list(.)
来查找要通过引用分配给列.
For the last case, you'll need one more list
because data.table
uses list(.)
to look for values to assign to columns by reference.
使用设置
:
dt = data.table(id = 1:2, comment = vector("list", 2L))
# assign value 1 to just the first column of 'comment'
set(dt, i=1L, j="comment", value=1L)
# assign value of 1 and "a" to rows 1 and 2
set(dt, j="comment", value=list(1, "a"))
# assign value of "a","b" to row 1, and 1 to row 2 for 'comment'
set(dt, j="comment", value=list(c("a", "b"), 1))
# assign list(1, "a") to just 1 row of 'comment'
set(dt, i=1L, j="comment", value=list(list(list(1, "a"))))
HTH
我正在使用当前的开发版本 1.9.3,但应该可以在任何其他版本上正常工作.
I'm using the current development version 1.9.3, but should just work fine on any other version.
> sessionInfo()
R version 3.0.3 (2014-03-06)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.9.3
loaded via a namespace (and not attached):
[1] plyr_1.8.0.99 reshape2_1.2.2 stringr_0.6.2 tools_3.0.3
这篇关于在 data.table 列中使用列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!