按组绘制错误错误 [英] Strange error plotting by group

查看:194
本文介绍了按组绘制错误错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于大规模的数据转储,我不能在我试过的数据的子集上重现这一点。将数据的 dput (165个obs。,不疯狂)复制粘贴到



此外,即使在分组的上下文中,不清楚为什么 plot.xy 已经接收到不同长度的参数 - 如果我们进行以下调整以强制R在它们之前记录输入,重新发送,似乎没有任何问题:

  all_sports [,{
cat n \\\
Plotting for sport:,.BY $ sport)
points(x1 < - season,y1 <-gini,col = cols [.BY $ sport])
lines(x2< ; - 季节,y2 < - five_yr_ma,col = cols [.BY $ sport],lwd = 3)
cat(\\\
points / season:,length(x1),
\\ npoints / gini:,length(y1),
\\\
lines / season:,length(x2),
\\\
lines / five_yr_ma:,length(y2)
by = sport]

有输出:

 #Plotting for sport:NHL 
#points / season:98
#points / gini:98
#lines / season:98
#lines / five_yr_ma:98

#运动绘图:NBA
#points / season:67
#points / gini:67
#lines / season:67
#lines / five_yr_ma:67






由于这似乎并不常见的机器,这里是我的 sessionInfo code>:

  R版本3.2.4(2016-03-10)
平台:x86_64- pc-linux-gnu(64位)
运行时:Ubuntu 14.04.3 LTS

语言环境:
[1] LC_CTYPE = en_US.UTF-8 LC_NUMERIC = C LC_TIME = en_US.UTF-8 LC_COLLATE = en_US.UTF-8
[5] LC_MONETARY = zh_US.UTF-8 LC_MESSAGES = zh_US.UTF-8 LC_PAPER = zh_US.UTF-8 LC_NAME = C
[9 ] LC_ADDRESS = C LC_TELEPHONE = C LC_MEASUREMENT = en_US.UTF-8 LC_IDENTIFICATION = C

附加的基本软件包:
[1] stats graphics grDevices utils数据集方法base

其他附加包:
[1] data.table_1.9.7

通过命名空间加载(并未附加):
[1] rsconnect_0.4.1.11 tools_3.2.4


解决方案

事实上,@Arun指出,是导致此问题中出现错误的(尚未解决的)问题的重新表面:



在RStudio中的data.table()中使用plot()时,使用错误组的值



当@Arun发现时,似乎RStudio的本地图形设备不知何故被用于评估 j 时创建的不同子组的变化指针跳过了通过每一次都有简单的 copy 所有 .SD 的解决方法,例如:

 点数(复制(季节),复制(gini),
col = cols [.BY $ sport ])
lines(copy(season),copy(five_yr_ma),
col = cols [.BY $ sport],lwd = 3)

  x < -  copy b $ b with(x,{points(season,gini,cols = cols [.BY $ sport]); 
lines(copy(season),copy(five_yr_ma),
col = cols [.BY $ sport],lwd = 3)})

这两个都对我有用(因为子组太小了,这里没有计算效率的关注 - 我们可以 copy ,而不会显着影响效果)。



这是#1524 data.table GitHub页面上,我已提交这个错误报告在RStudio支持;将会更新这个如果一个修复被推。


Sorry for the massive data dump but I can't reproduce this on the subsets of the data I've tried. Copy-pasted the dput of the data (165 obs., not crazy) to this Gist.

I'm trying to plot the data in DT by sport, according to:

  1. Create empty plot with proper limits to accommodate all data
  2. Plot the column gini as a scatterplot, with colors varying by sport
  3. Plot the column five_year_ma as a line, with color matching that in 2.

This should be simple and I've done things like it before. Here's what should work:

#empty plot with proper axes
DT[ , plot(
  NA, ylim = range(gini), xlim = range(season), 
  xlab = "Season", ylab = "Gini",
  main = "Comparison of Gini Coefficient Across Sports")]

#pick colors for each sport
cols <- c(NHL="black", NBA="red")

DT[ , {
  #add points to current plot
  points(season, gini, col = cols[.BY$sport])

  #add lines to current plot
  lines(season, five_yr_ma, col = cols[.BY$sport], lwd = 3)},
  by = sport]

But this gives me output/error:

# Empty data.table (0 rows) of 1 col: sport

Error: x and y lengths differ in plot.xy()

This is strange. If we skip the grouping and just do it manually, it works perfectly fine:

all_sports[sport == "NBA", {
  points(season, gini, col = "red")
  lines(season, five_yr_ma, col = "red", lwd = 3)}]

all_sports[sport == "NHL", {
  points(season, gini, col = "black")
  lines(season, five_yr_ma, col = "black", lwd = 3)}]

Moreover, even in the context of grouping, it's unclear why plot.xy has received arguments of different length -- if we make the following adjustment to force R to record the inputs just before they're sent, there doesn't appear to be any issue:

all_sports[ , {
  cat("\n\nPlotting for sport: ", .BY$sport)
  points(x1 <- season, y1 <- gini, col = cols[.BY$sport])
  lines(x2 <- season, y2 <- five_yr_ma, col = cols[.BY$sport], lwd = 3)
  cat("\npoints/season: ",length(x1),
      "\npoints/gini: ", length(y1),
      "\nlines/season: ", length(x2),
      "\nlines/five_yr_ma: ", length(y2))},
  by = sport]

Has output:

# Plotting for sport:  NHL
# points/season:  98 
# points/gini:  98 
# lines/season:  98 
# lines/five_yr_ma:  98

# Plotting for sport:  NBA
# points/season:  67 
# points/gini:  67 
# lines/season:  67 
# lines/five_yr_ma:  67

What could be going on??


Since it appears like this is not common across machines, here's my sessionInfo():

R version 3.2.4 (2016-03-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.3 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.9.7

loaded via a namespace (and not attached):
[1] rsconnect_0.4.1.11 tools_3.2.4  

解决方案

Indeed, as @Arun points out, it seems this is a resurfacing of the (as yet unsolved) issue which was causing the error in this question:

Values of the wrong group are used when using plot() within a data.table() in RStudio

As @Arun discovered there, it seems like RStudio's native graphics device is somehow getting tripped up by the changing pointers used for the different subgroups created when evaluating j when by is present, which lends itself to the workaround of simply copying all of .SD each time, like:

points(copy(season), copy(gini),
       col = cols[.BY$sport])
lines(copy(season), copy(five_yr_ma), 
      col = cols[.BY$sport], lwd = 3)

Or

x <- copy(.SD)
with(x, {points(season, gini, cols = cols[.BY$sport]);
         lines(copy(season), copy(five_yr_ma), 
           col = cols[.BY$sport], lwd = 3)})

Both of which worked for me (since the subgroups are so small, there's no computational efficiency concern at play here -- we can copy away without affecting performance noticeably).

This is #1524 at the data.table GitHub page and I've filed this bug report at RStudio Support; will update this if a fix is pushed.

这篇关于按组绘制错误错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆