当在RStudio中的data.table()中使用plot()时,使用错误组的值 [英] Values of the wrong group are used when using plot() within a data.table() in RStudio

查看:258
本文介绍了当在RStudio中的data.table()中使用plot()时,使用错误组的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想生成一个分割图。在图的上部,组 a 的值应该在下面的值 b 用过的。我使用 data.table()来做到这一点。下面是我用来生成示例并设置图形输出的代码:

I want to generate a divided diagram. On the upper section of the diagram the values of group a, on the lower one the values of group b should be used. I am using data.table() to do this. Here is the code I used to generate an example and set up the graphical output:

library(data.table)
set.seed(23)
Example <- data.table('group' = rep(c('a', 'b'), each = 5), 'value' = runif(10))
layout(1:2)
par('mai' = rep(.5, 4))

在通常的r控制台中运行以下行时,正确的值用于绘制。当在Rstudio中运行相同的代码时,第二组的值将用于这两个图:

When running the following lines in the usual r console the correct values are used for the plotting. When running the same code in Rstudio the values of the second group are used for both diagrams:

Example[, plot(value, ylim = c(0, 1)), by = group] # Example 1
Example[, .SD[plot(value, ylim = c(0, 1))], by = group] # Example 2

在子集中添加逗号data.table .SD [ code>示例2中,在Rstudio中也会生成正确的输出:

When adding a comma in the subset data.table .SD[] of example 2 the correct output is generated in Rstudio as well:

Example[, .SD[, plot(value, ylim = c(0, 1))], by = group] # Example 3

当使用 barplot()而不是 plot() Rstudio也使用正确的值:

When using barplot() rather than plot() Rstudio uses the correct values as well:

Example[, barplot(value, ylim = c(0, 1)), by = group] # Example 4

我忽略了什么,还是这是一个错误?

Did I overlook something or is this a bug?

系统:Windows 7,Rstudio Desktop v0.98.1091,R 3.1.2,data.table 1.9.4

System: Windows 7, Rstudio Desktop v0.98.1091, R 3.1.2, data.table 1.9.4

推荐答案

好的抓住(已经+1了)!在我的情况下,示例3不会产生正确的图(OS X 10.10.1,R 3.1.2,Rstudio 0.98.1091)。

Nice catch (+1'd already)! In my case, Example 3 doesn't produce the right plot as well (OS X 10.10.1, R 3.1.2, Rstudio 0.98.1091).

R控制台/ GUI和Rstudio之间的区别这里是绘图设备。 RStudio似乎使用一个原生的图形设备 RstudioGD ,其中作为R控制台/ GUI使用 Quartz

The only difference between R console/GUI and Rstudio here is the plotting device. RStudio seems to be using a native graphics device RstudioGD, where as R console / GUI uses Quartz.

通过调试 graphics ::: plot.default 我能够将问题缩小到函数 plot.xy()。此函数调用不同的图形设备(如上所示)。

By debugging graphics:::plot.default I was able to narrow down the issue to the function plot.xy(). This function calls different graphics devices (as shown above).

例如,通过调用 Quartz 函数 quartz()然后运行你的代码工作正常!

By initiating, for example, Quartz by calling the function quartz() and then running your code works fine!

FWIW这个问题可以使用 dplyr()

FWIW this issue can be reproduced using dplyr() as well:

require(dplyr)
df = as.data.frame(Example)
my_fun = function(x) {plot(x, ylim=c(0,1)); 1L }
df %>% group_by(group) %>% summarise(my_fun(value))

会导致同样的错误情节。

will result in the same wrong plot.

这很可能是由于在data.table中处理子组的方式(我认为 dplyr 应该像data.table一样),你可以看到:

This is most likely due to the way the subgroups are handled in data.table (and I think dplyr should be doing it the same way as data.table), which you can see by:

Example[, print(sapply(.SD, address)), by=group]
#         value 
# "0x105bbf5b8" 
#         value 
# "0x105bbf5b8" 
# Empty data.table (0 rows) of 1 col: group

data.table .SD 分配最大的组,并为每个子组内部重用此内存,以避免重复的内存分配/ dealloc - 效率。不确定(这里在黑暗中拍摄),但似乎 RstudioGD 不放开与子组链接的指针,并且子组中的数据更新,绘图也得到更新。您可以通过执行以下验证:

data.table assigns the largest group for .SD and internally reuses this memory for each subgroup so as to avoid repetitive memory alloc/dealloc - for efficiency. Not sure (shooting in the dark here), but it seems like RstudioGD doesn't let go of the pointer linked with the subgroup, and as the data in the subgroup gets updated, the plot gets updated too. You can verify this by doing:

# on RstudioGD
debug(graphics:::plot.default)
set.seed(23)
Example <- data.table('group' = rep(c('a', 'b'), each = 5), 'value' = runif(10))
layout(1:2)
par('mai' = rep(.5, 4))
Example[, plot(value, ylim = c(0, 1)), by = group] # Example 1
undebug(graphics:::plot.default)

保持输入,你会看到第一个绘图是正确的..当第二个绘图被添加,第一个绘图也改变。这可能是Rv3.1 +最近更改的结果,浅表复制函数参数,而不是深度复制(再次,在这里黑暗中拍摄)。

Keep hitting enter, and you'll see that the first plot is plotted right.. and when the second plot is added, the first plot changes as well. This may be a consequence of recent changes in Rv3.1+ which shallow copies function arguments rather than deep copying (again, shooting in the dark here).

通过明确复制 value 来修复此问题:

You can temporarily fix this by explicitly copying value:

Example[, plot(copy(value), ylim = c(0, 1)), by = group] # Example 1

将产生正确的情节。

这篇关于当在RStudio中的data.table()中使用plot()时,使用错误组的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆