dplyr面具GGally并打破ggparcoord [英] dplyr masks GGally and breaks ggparcoord

查看:268
本文介绍了dplyr面具GGally并打破ggparcoord的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在新的会话中,
执行



<$ p $的文档中提供的 ggparcoord(。) p> 图书馆(GGally)

数据(diamonds,package =ggplot2)
diamonds.samp< - diamonds [sample(1:dim(diamonds) [1],100),]
ggparcoord(data = diamonds.samp,columns = c(1,5:10))

结果为以下情节:





再次,从一个新的会话开始,并使用加载的 执行相同的脚本

 code> library(GGally)
library(dplyr)

data(diamonds,package =ggplot2)
diamonds.samp< - diamonds [sample :dim(diamonds)[1],100),]
ggparcoord(data = diamonds.samp,columns = c(1,5:10))

结果:


错误:(list)对象不能强制键入'double'


请注意库(。)语句的顺序



问题


  1. 代码示例有问题吗?

  2. 有没有办法克服这个问题(通过某些命名空间函数)?

  3. 还是这个错误?

我需要 dplyr ggparcoord(。),但这个最小的例子反映了我面临的问题。



版本




  • R @ 3.2.3

  • dplyr @ 0.4.3

  • GGally @ 1.0.1

  • ggplot @ 2.0.0



更新



包装Joran提供的优秀答案:



答案


  1. 代码示例实际上是错误的,因为 ggparcoord(。) expec ts数据框架不是钻石数据集(如果加载dplyr)给出的一个 tbl_df 。

  2. 问题是通过将 tbl_df 胁迫到数据框架来解决。

  3. 不,这不是一个错误。

工作代码示例:

 图书馆(GGally)
库(dplyr)

数据(diamonds,package =ggplot2)
diamonds.samp< - diamonds [sample(1:dim )[1],100),]
ggparcoord(data = as.data.frame(diamonds.samp),columns = c(1,5:10))


解决方案

将我的评论转换为答案...



这里的GGally包正在做出合理的假设,即在数据框上使用 [应该按照它始终如一的方式运行。然而,这一切都在哈德利经文中,钻石数据集是一个 tbl_df 以及 data.frame



当加载 dplyr 时, [的行为被覆盖,以便 drop = FALSE 始终是 tbl_df 的默认值。因此,在 GGally 中有一个地方,其中 data [,cut] 预计将返回一个向量,而是返回另一个数据帧。 / p>

...具体来说,尝试执行时,您的示例中会抛出错误:

 code> data [,fact.var]<  -  as.numeric(data [,fact.var])。 

由于 data [,fact.var] 仍然是一个数据框,因此列表 as.numeric 将无法正常工作。



至于你的结论这不是一个bug,我会说....也许。大概。至少GGally 软件包作者应该怎么做才能解决这个问题。您只需要注意,使用非Hadley书面包的 tbl_df 可能会破坏事物。



As您注意到,删除额外的类属性可以修复问题,因为它将R返回到使用正常的 [方法。


Given a fresh session, executing a small ggparcoord(.) example provided in the documentation of the function

library(GGally)

data(diamonds, package="ggplot2")
diamonds.samp <- diamonds[sample(1:dim(diamonds)[1], 100), ]
ggparcoord(data = diamonds.samp, columns = c(1, 5:10))

results into the following plot:

Again, starting in a fresh session and executing the same script with the loaded dplyr

library(GGally)
library(dplyr)

data(diamonds, package="ggplot2")
diamonds.samp <- diamonds[sample(1:dim(diamonds)[1], 100), ]
ggparcoord(data = diamonds.samp, columns = c(1, 5:10))

results in:

Error: (list) object cannot be coerced to type 'double'

Note that the order of the library(.) statements does not matter.

Questions

  1. Is there something wrong with the code samples?
  2. Is there a way to overcome the problem (over some namespace functions)?
  3. Or is this a bug?

I need both dplyr and ggparcoord(.) in a bigger analysis but this minimal example reflects the problem i am facing.

Versions

  • R @ 3.2.3
  • dplyr @ 0.4.3
  • GGally @ 1.0.1
  • ggplot @ 2.0.0

UPDATE

To wrap the excellent answer given by Joran up:

Answers

  1. The code samples are in fact wrong as ggparcoord(.) expects a data.frame not a tbl_df as given by the diamonds data set (if dplyr is loaded).
  2. The problem is solved by coercing the tbl_df to a data.frame.
  3. No it is not a bug.

Working code sample:

library(GGally)
library(dplyr)

data(diamonds, package="ggplot2")
diamonds.samp <- diamonds[sample(1:dim(diamonds)[1], 100), ]
ggparcoord(data = as.data.frame(diamonds.samp), columns = c(1, 5:10))

解决方案

Converting my comments to an answer...

The GGally package here is making the reasonable assumption that using [ on a data frame should behave the way it always does and always has. However, this all being in the Hadley-verse, the diamonds data set is a tbl_df as well as a data.frame.

When dplyr is loaded, the behavior of [ is overridden such that drop = FALSE is always the default for a tbl_df. So there's a place in GGally where data[,"cut"] is expected to return a vector, but instead it returns another data frame.

...specifically, the error is thrown in your example while attempting to execute:

data[, fact.var] <- as.numeric(data[, fact.var]). 

Since data[,fact.var] remains a data frame, and hence a list, as.numeric won't work.

As for your conclusion that this isn't a bug, I'd say....maybe. Probably. At least there probably isn't anything the GGally package author ought to do to address it. You just have to be aware that using tbl_df's with non-Hadley written packages may break things.

As you noted, removing the extra class attributes fixes the problem, as it returns R to using the normal [ method.

这篇关于dplyr面具GGally并打破ggparcoord的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆