在data.table中进行浅表复制 [英] Make a shallow copy in data.table

查看:98
本文介绍了在data.table中进行浅表复制的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在SO主题中读到了马特·道尔(Matt Dowle)关于shallow函数的答案,该函数用于在data.table中进行浅表复制.但是,我再也找不到该主题.

I read in an SO topic an answer from Matt Dowle about a shallow function to make shallow copies in data.table. However, I can't find the topic again.

data.table没有任何名为shallow的导出函数.有一个内部文件,但没有记录.我可以安全使用吗?它的行为是什么?

data.table does not have any exported function called shallow. There is an internal one but not documented. Can I use it safely? What is its behavior?

我想做的是一个大表的内存有效副本.假设DT是一个具有n列的大表,并且f是一个内存有效地增加一列的函数.这样有可能吗?

What I would like to do is a memory efficient copy of a big table. Let DT be a big table with n columns and f a function which memory efficiently adds a column. Is something like that possible?

DT2 = f(DT)

,其中DT2data.table,其中n列指向原始地址(无深层副本),而另外一个仅用于DT2.如果是,如果我执行DT2[, col3 := NULL],会添加到DT1后面吗?

with DT2 being a data.table with n columns pointing to the original adresses (no deep copies) and an extra one existing only for DT2. If yes, what appends to DT1 if I do DT2[, col3 := NULL]?

推荐答案

您不能安全地使用data.table:::shallow,不可以.故意不将其导出,也不打算供用户使用.从它本身起作用的角度来看,或者它的名称或论点将来都会改变.

You can't use data.table:::shallow safely, no. It is deliberately not exported and not meant for user use. Either from the point of view of it itself working, or its name or arguments changing in future.

话虽如此,您可以决定使用它,只要您可以i)确保您或您的用户不会在结果上调用:=set*(如果正在创建) ii)如果在结果上调用了:=set*,则可以通过引用更改两个对象.当data.table在内部使用浅表时,这就是我们的承诺.

Having said this, you could decide to use it as long as you can either i) guarantee that := or set* won't be called on the result either by you or your users (if you're creating a package) or ii) if := or set* is called on the result then you're ok with both objects being changed by reference. When shallow is used internally by data.table, that's what we promise ourselves.

几天前,此答案的更多背景在这里: https://stackoverflow.com/a/45891502/403310

More background in this answer a few days ago here : https://stackoverflow.com/a/45891502/403310

在这个问题上,我要求更大的前景:为什么需要这样做?明确这一点将有助于提高研究ALTREP或进行我们自己的参考计数的优先级.

In that question I asked for the bigger picture: why is this needed? Having that clear would help to raise the priority in either investigating ALTREP or perhaps doing our own reference count.

在您的问题中,您提到了更广阔的前景,这非常有用.因此,您想创建一个函数,该函数将工作列添加到该函数内的big data.table中,但不更改big data.table.您能否进一步解释为什么要创建这样的函数?为什么不加载大数据表,直接向其中添加临时工作列,然后继续.您的R会话已经是存储在其他位置的数据的工作副本.

In your question you alluded to your bigger picture and that is very useful. So you'd like to create a function which adds working columns to a big data.table inside the function but doesn't change the big data.table. Can you explain more why you'd like to create a function like that? Why not load the big data.table, add the ephemeral working columns directly to it, and then proceed. Your R session is already a working copy in memory of the data which is persistent somewhere else.

请注意,我并不是说不.我并不是说您没有正当的理由.我要求发现更多有关该合理原因的信息,以便可以提高优先级.

Note that I am not saying no. I'm not saying that you don't have a valid reason. I'm asking to discover more about that valid reason so the priority can be raised.

如果这不是您看到的答案,则搜索字符串"[data.table] shallow"当前返回39个问题或答案.最坏的情况是,您可以拖网浏览以再次找到它.

If that isn't the answer you had seen, there are currently 39 question or answers returned by the search string "[data.table] shallow". Worst case, you could trawl through those to find it again.

这篇关于在data.table中进行浅表复制的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆