列名称中的破折号产生“找不到对象”。错误 [英] Dash in column name yields "object not found" Error
问题描述
我具有一个从数据生成散点图的功能,其中提供了一个参数来选择用于着色点的列。这是一个简化的版本:
I have a function to generate scatter plots from data, where an argument is provided to select which column to use for coloring the points. Here is a simplified version:
library(ggplot2)
plot_gene <- function (df, gene) {
ggplot(df, aes(x, y)) +
geom_point(aes_string(col = gene)) +
scale_color_gradient()
}
其中 df
是 data.frame 包含列 x
, y
的列,然后是一堆基因名称。这对大多数基因名称都适用。但是,有些破折号会失败:
where df
is a data.frame with columns x
, y
, and then a bunch of gene names. This works fine for most gene names; however, some have dashes and these fail:
print(plot_gene(df, "Gapdh")) # great!
print(plot_gene(df, "H2-Aa")) # Error: object "H2" not found
似乎已解析基因
变量( H2-Aa
变为 H2-Aa
)。我该如何解决?有没有办法表明字符串不应该通过 aes_string
中的 eval
?
It appears the gene
variable is getting parsed ("H2-Aa"
becomes H2 - Aa
). How can I get around this? Is there a way to indicate that a string should not go through eval
in aes_string
?
如果您需要一些输入来玩,就像我的数据一样失败:
If you need some input to play with, this fails like my data:
df <- data.frame(c(1,2), c(2,1), c(1,2), c(2,1))
colnames(df) <- c("x", "y", "Gapdh", "H2-Aa")
对于我的真实数据,我使用的是 read.table(...,header = TRUE)
并使用破折号获取列名称,因为
For my real data, I am using read.table(..., header=TRUE)
and get column names with dashes because the raw data files have them.
推荐答案
通常,R会非常努力地确保您的data.frame中具有列名,可以是有效的变量名。当使用使用非标准评估类型语法的函数时,使用非标准列名(那些不是有效的变量名)会导致问题。当专注于使用这样的变量名时,您通常必须将它们包装在中间。在正常情况下
Normally R tries very hard to make sure you have column names in your data.frame that can be valid variable names. Using non-standard column names (those that are not valid variable names) will lead to problems when using functions that use non-standard evaluation type syntax. When focused to use such variable names you often have to wrap them in back ticks. In the normal case
ggplot(df, aes(x, y)) +
geom_point(aes(col = H2-Aa)) +
scale_color_gradient()
# Error in FUN(X[[i]], ...) : object 'H2' not found
会返回错误,但是
ggplot(df, aes(x, y)) +
geom_point(aes(col = `H2-Aa`)) +
scale_color_gradient()
会工作。
如果您确实需要,可以在反引号中粘贴
You can paste in backticks if you really want
geom_point(aes_string(col = paste0("`", gene, "`")))
或者您也可以从一开始就将其视为符号并使用 aes_q
内嵌
or you could treat it as a symbol from the get-go and use aes_q
instread
geom_point(aes_q(col = as.name(gene)))
最新版本的 ggplot
支持通过 !!
转义 aes_string
或 aes_q
这样就可以做到
The latest release of ggplot
support escaping via !!
rather than using aes_string
or aes_q
so you could do
geom_point(aes(col = !!rlang::sym(gene)))
这篇关于列名称中的破折号产生“找不到对象”。错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!