如何通过data.table中的引用删除一行? [英] How to delete a row by reference in data.table?

查看:32
本文介绍了如何通过data.table中的引用删除一行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的问题与通过引用分配与在 data.table 中复制有关.我想知道是否可以通过引用删除行,类似于

My question is related to assignment by reference versus copying in data.table. I want to know if one can delete rows by reference, similar to

DT[ , someCol := NULL]

我想知道

DT[someRow := NULL, ]

我想有一个很好的理由说明为什么这个函数不存在,所以也许你可以指出一个很好的替代通常的复制方法,如下所示.特别是,我最喜欢的例子(data.table),

I guess there's a good reason for why this function doesn't exist, so maybe you could just point out a good alternative to the usual copying approach, as below. In particular, going with my favourite from example(data.table),

DT = data.table(x = rep(c("a", "b", "c"), each = 3), y = c(1, 3, 6), v = 1:9)
#      x y v
# [1,] a 1 1
# [2,] a 3 2
# [3,] a 6 3
# [4,] b 1 4
# [5,] b 3 5
# [6,] b 6 6
# [7,] c 1 7
# [8,] c 3 8
# [9,] c 6 9

假设我想从这个 data.table 中删除第一行.我知道我可以做到这一点:

Say I want to delete the first row from this data.table. I know I can do this:

DT <- DT[-1, ]

但通常我们可能想要避免这种情况,因为我们正在复制对象(这需要大约 3*N 内存,如果 N object.size(DT), 正如这里所指出的.现在我找到了 set(DT, i, j, value).我知道如何设置特定值(例如:将第 1 行和第 2 行以及第 2 列和第 3 列中的所有值设置为零)

but often we may want to avoid that, because we are copying the object (and that requires about 3*N memory, if N object.size(DT), as pointed out here. Now I found set(DT, i, j, value). I know how to set specific values (like here: set all values in rows 1 and 2 and columns 2 and 3 to zero)

set(DT, 1:2, 2:3, 0) 
DT
#      x y v
# [1,] a 0 0
# [2,] a 0 0
# [3,] a 6 3
# [4,] b 1 4
# [5,] b 3 5
# [6,] b 6 6
# [7,] c 1 7
# [8,] c 3 8
# [9,] c 6 9

但是我怎样才能擦除前两行呢?做

But how can I erase the first two rows, say? Doing

set(DT, 1:2, 1:3, NULL)

将整个 DT 设置为 NULL.

sets the entire DT to NULL.

我的SQL知识非常有限,所以你们告诉我:给定data.table使用SQL技术,是否有等价的SQL命令

My SQL knowledge is very limited, so you guys tell me: given data.table uses SQL technology, is there an equivalent to the SQL command

DELETE FROM table_name
WHERE some_column=some_value

在数据表中?

推荐答案

好问题.data.table 还不能通过引用删除行.

Good question. data.table can't delete rows by reference yet.

data.table 可以通过引用添加和删除,因为它过度分配了列指针的向量,如您所知.计划是对行做类似的事情,并允许快速insertdelete.行删除将使用 C 中的 memmove 在删除的行之后移动项目(在每一列中).与行存储数据库(如 SQL)相比,删除表中间的行仍然效率很低,SQL 更适合在表中的任何行快速插入和删除行.但是,它仍然比复制一个没有删除行的新大对象要快得多.

data.table can add and delete columns by reference since it over-allocates the vector of column pointers, as you know. The plan is to do something similar for rows and allow fast insert and delete. A row delete would use memmove in C to budge up the items (in each and every column) after the deleted rows. Deleting a row in the middle of the table would still be quite inefficient compared to a row store database such as SQL, which is more suited for fast insert and delete of rows wherever those rows are in the table. But still, it would be a lot faster than copying a new large object without the deleted rows.

另一方面,由于列向量会被过度分配,因此可以立即插入(和删除)行在末尾;例如,一个不断增长的时间序列.

On the other hand, since column vectors would be over-allocated, rows could be inserted (and deleted) at the end, instantly; e.g., a growing time series.

已作为问题提交:通过引用删除行.

这篇关于如何通过data.table中的引用删除一行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆