按组绘制时收到意外错误 [英] Receiving an unexpected error when plotting by group

查看:52
本文介绍了按组绘制时收到意外错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对大量数据转储很抱歉,但是我无法在我尝试过的数据子集上重现此问题.将数据的 dput 复制粘贴(165个观察点,并不疯狂)到

此外,即使在分组的情况下,也不清楚为什么 plot.xy 接收到不同长度的参数-如果我们进行以下调整以强制R在它们输入之前记录输入,重新发送,似乎没有任何问题:

  all_sports [,{cat("\ n \ n绘制运动图:",.BY $ sport)点(x1 <-赛季,y1 <-基尼,col = cols [.BY $ sport])行(x2 <-赛季,y2 <-Five_yr_ma,col = cols [.BY $ sport],lwd = 3)cat("\ npoints/season:",length(x1)," \ npoints/gini:" ;,长度(y1),"\ nlines/季节:",时长(x2)," \ nlines/five_yr_ma:" ;, length(y2))},通过=运动] 

有输出:

 #运动图:NHL每季得分:98#点/基尼:98#线数/赛季:98#行/五岁_马:98#体育情节:NBA每季得分:67#点/基尼:67#线数/赛季:67#行/五岁_马:67 

怎么回事?


由于这似乎在机器之间并不常见,所以这是我的 sessionInfo():

  R版本3.2.4(2016-03-10)平台:x86_64-pc-linux-gnu(64位)运行于:Ubuntu 14.04.3 LTS语言环境:[1] LC_CTYPE = en_US.UTF-8 LC_NUMERIC = C LC_TIME = en_US.UTF-8 LC_COLLATE = en_US.UTF-8[5] LC_MONETARY = zh_CN.UTF-8 LC_MESSAGES = zh_CN.UTF-8 LC_PAPER = zh_CN.UTF-8 LC_NAME = C[9] LC_ADDRESS = C LC_TELEPHONE = C LC_MEASUREMENT = zh_CN.UTF-8 LC_IDENTIFICATION = C附带的基本软件包:[1]统计图形grDevices utils数据集方法库其他附件包:[1] data.table_1.9.7通过名称空间(未附加)加载:[1] rsconnect_0.4.1.11 tools_3.2.4 

解决方案

实际上,正如@Arun指出的那样,这似乎是对(尚未解决的)问题的重塑,该问题导致了此问题中的错误:

在RStudio中的data.table()中使用plot()时,使用了错误的组的值

正如@Arun在那发现的那样,似乎RStudio的本机图形设备在某种程度上被用于评估 j by 时创建的不同子组的不断变化的指针所绊倒.存在,这使其适合于每次都简单地 copy 复制所有 .SD 的变通方法,例如:

  points(复制(季节),复制(gini),col = cols [.BY $ sport])行数(复制(季节),复制(five_yr_ma),col = cols [.BY $ sport],lwd = 3) 

  x<-复制(.SD)with(x,{points(season,gini,cols = cols [.BY $ sport]);行数(复制(季节),复制(five_yr_ma),col = cols [.BY $ sport],lwd = 3)}) 

这两种方法都对我有用(由于子组很小,因此这里没有计算效率方面的问题-我们可以复制而不会显着影响性能)

这是 data.table上的#1524 /code> GitHub页面,我已经在this Gist.

I'm trying to plot the data in DT by sport, according to:

  1. Create empty plot with proper limits to accommodate all data
  2. Plot the column gini as a scatterplot, with colors varying by sport
  3. Plot the column five_year_ma as a line, with color matching that in 2.

This should be simple and I've done things like it before. Here's what should work:

#empty plot with proper axes
DT[ , plot(
  NA, ylim = range(gini), xlim = range(season), 
  xlab = "Season", ylab = "Gini",
  main = "Comparison of Gini Coefficient Across Sports"
)]

#pick colors for each sport
cols <- c(NHL="black", NBA="red")

DT[ , by = sport, {
  #add points to current plot
  points(season, gini, col = cols[.BY$sport])

  #add lines to current plot
  lines(season, five_yr_ma, col = cols[.BY$sport], lwd = 3)
}]

But this gives me output/error:

# Empty data.table (0 rows) of 1 col: sport

Error: x and y lengths differ in plot.xy()

This is strange. If we skip the grouping and just do it manually, it works perfectly fine:

all_sports[sport == "NBA", {
  points(season, gini, col = "red")
  lines(season, five_yr_ma, col = "red", lwd = 3)
}]

all_sports[sport == "NHL", {
  points(season, gini, col = "black")
  lines(season, five_yr_ma, col = "black", lwd = 3)
}]

Moreover, even in the context of grouping, it's unclear why plot.xy has received arguments of different length -- if we make the following adjustment to force R to record the inputs just before they're sent, there doesn't appear to be any issue:

all_sports[ , {
  cat("\n\nPlotting for sport: ", .BY$sport)
  points(x1 <- season, y1 <- gini, col = cols[.BY$sport])
  lines(x2 <- season, y2 <- five_yr_ma, col = cols[.BY$sport], lwd = 3)
  cat("\npoints/season: ",length(x1),
      "\npoints/gini: ", length(y1),
      "\nlines/season: ", length(x2),
      "\nlines/five_yr_ma: ", length(y2))},
  by = sport]

Has output:

# Plotting for sport:  NHL
# points/season:  98 
# points/gini:  98 
# lines/season:  98 
# lines/five_yr_ma:  98

# Plotting for sport:  NBA
# points/season:  67 
# points/gini:  67 
# lines/season:  67 
# lines/five_yr_ma:  67

What could be going on??


Since it appears like this is not common across machines, here's my sessionInfo():

R version 3.2.4 (2016-03-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.3 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.9.7

loaded via a namespace (and not attached):
[1] rsconnect_0.4.1.11 tools_3.2.4  

解决方案

Indeed, as @Arun points out, it seems this is a resurfacing of the (as yet unsolved) issue which was causing the error in this question:

Values of the wrong group are used when using plot() within a data.table() in RStudio

As @Arun discovered there, it seems like RStudio's native graphics device is somehow getting tripped up by the changing pointers used for the different subgroups created when evaluating j when by is present, which lends itself to the workaround of simply copying all of .SD each time, like:

points(copy(season), copy(gini),
       col = cols[.BY$sport])
lines(copy(season), copy(five_yr_ma), 
      col = cols[.BY$sport], lwd = 3)

Or

x <- copy(.SD)
with(x, {points(season, gini, cols = cols[.BY$sport]);
         lines(copy(season), copy(five_yr_ma), 
           col = cols[.BY$sport], lwd = 3)})

Both of which worked for me (since the subgroups are so small, there's no computational efficiency concern at play here -- we can copy away without affecting performance noticeably).

This is #1524 at the data.table GitHub page and I've filed this bug report at RStudio Support; will update this if a fix is pushed.

这篇关于按组绘制时收到意外错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆