在 data.table 列中使用列表 [英] Using lists inside data.table columns

查看:15
本文介绍了在 data.table 列中使用列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

data.table 中可能有 list 类型的列,我第一次尝试从这个功能中受益.我需要为表 dt 的每一行存储从 rApache Web 服务获取的若干注释.每条评论都有一个用户名、日期时间和正文项.

In data.table is possible to have columns of type list and I'm trying for the first time to benefit from this feature. I need to store for each row of my table dt several comments taken from an rApache web service. Each comment will have a username, datetime, and body item.

而不是使用带有一些奇怪的、不寻常的字符的长字符串来将每条消息与其他消息分开(如 |),并使用 ; 来分隔评论中的每个项目,我想使用这样的列表:

Instead of using long strings with some weird, unusual character to separate each message from the others (like |), and a ; to separate each item in a comment, I thought to use lists like this:

library(data.table)
dt <- data.table(id=1:2,
        comment=list(list(
            list(username="michele", date=Sys.time(), message="hello"),
            list(username="michele", date=Sys.time(), message="world")),
          list(
            list(username="michele", date=Sys.time(), message="hello"),
            list(username="michele", date=Sys.time(), message="world"))))

> dt
   id comment
1:  1  <list>
2:  2  <list>

存储为特定行添加的所有评论.(也因为稍后我需要将其发送回 UI 时转换为 JSON 会更容易)

to store all the comments added for one particular row. (also because it will be easier to convert to JSON later on when I need to send it back to the UI)

但是,当我尝试模拟在生产过程中实际填充表格的方式(向特定行添加单个注释)时,R 要么崩溃,要么没有分配我想要的内容,并且然后崩溃:

However, when I try to simulate how I will be actually filling my table during production (adding single comment to a particular row), R either crashes or doesn't assign what I would like and then crashes:

library(data.table)

> library(data.table)
> dt <- data.table(id=1:2, comment=vector(mode="list", length=2))
> dt$comment
[[1]]
NULL

[[2]]
NULL

> dt[1L, comment := 1] # this works
> dt$comment
[[1]]
[1] 1

[[2]]
NULL

> set(dt, 1L, "comment", list(1, "a"))  # assign only `1` and when I try to see `dt` R crashes
Warning message:
In set(dt, 1L, "comment", list(1, "a")) :
  Supplied 2 items to be assigned to 1 items of column 'comment' (1 unused)

> dt[1L, comment := list(1, "a")]       # R crashes as soon as I run
> dt[1L, comment := list(list(1, "a"))] # any of these two

我知道我试图滥用 data.table,例如j 参数的设计方式允许这样做:

I know I'm trying to misuse data.table, e.g. the way the j argument has been designed allows this:

dt[1L, c("id", "comment") := list(1, "a")] # lists in RHS are seen as different columns! not parts of one

问题:那么,有没有办法完成我想要的任务?或者我只需要在变量中取出 dt$comment,修改它,然后每次需要更新时重新分配整个列?

Question: So, is there a way to do the assignment I want? Or I just have to take dt$comment out in a variable, modify it, and then re-assign the whole column every times I need to do an update?

推荐答案

使用 :=:

dt = data.table(id = 1:2, comment = vector("list", 2L))

# assign value 1 to just the first column of 'comment'
dt[1L, comment := 1L]

# assign value of 1 and "a" to rows 1 and 2
dt[, comment := list(1, "a")]

# assign value of "a","b" to row 1, and 1 to row 2 for 'comment'
dt[, comment := list(c("a", "b"), 1)]

# assign list(1, "a") to just 1 row of 'comment'
dt[1L, comment := list(list(list(1, "a")))]

对于最后一种情况,您还需要一个 list,因为 data.table 使用 list(.) 来查找要通过引用分配给列.

For the last case, you'll need one more list because data.table uses list(.) to look for values to assign to columns by reference.

使用设置:

dt = data.table(id = 1:2, comment = vector("list", 2L))

# assign value 1 to just the first column of 'comment'
set(dt, i=1L, j="comment", value=1L)

# assign value of 1 and "a" to rows 1 and 2
set(dt, j="comment", value=list(1, "a"))

# assign value of "a","b" to row 1, and 1 to row 2 for 'comment'
set(dt, j="comment", value=list(c("a", "b"), 1))

# assign list(1, "a") to just 1 row of 'comment'
set(dt, i=1L, j="comment", value=list(list(list(1, "a"))))

HTH

我正在使用当前的开发版本 1.9.3,但应该可以在任何其他版本上正常工作.

I'm using the current development version 1.9.3, but should just work fine on any other version.

> sessionInfo()
R version 3.0.3 (2014-03-06)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.9.3

loaded via a namespace (and not attached):
[1] plyr_1.8.0.99  reshape2_1.2.2 stringr_0.6.2  tools_3.0.3   

这篇关于在 data.table 列中使用列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆