如何处理列名中的空格? [英] How to deal with spaces in column names?
问题描述
我知道如果变量名称中没有空格,这是首选。我有一种情况需要出版质量的图表,因此轴和图例需要格式正确的标签,即空格。因此,例如,在开发过程中,我可能会有变量名为Pct.On.OAC和Age.Group,但在我的最终情节中,我需要%on OAC和Age Group出现:
'data.frame':22 obs。 3个变量:
$%OAC:因子w / 11等级0,0.1-9.9,..:1 2 3 4 5 6 7 8 9 10 ...
$年龄组:因子/ 2级80岁及以上,1:1 1 1 1 1 1 1 1 1 ...
$执业次数:整数47 5 33 98 287 543 516 222 67 14 ...
但是当我尝试绘制这些图片时:
ggplot(dt.m,aes(x =`%on OAC`,y =`Number of Practices`,fill =`Age Group`))+
geom_bar()
)
没问题。但是,当我添加一个方面:
ggplot(dt.m,aes(x =`%on OAC`,y =`实践数量`,填充=`年龄组`))+
geom_bar()+
facet_grid(`年龄组)〜$。
我得到
中的错误[。data.frame (base,names(rows)) :未定义列选择
如果我将年龄段
更改为 Age.Group
然后它可以正常工作,但正如我所说的,我不希望点出现在标题图例中。
所以我的问题是:
- 是否有解决问题的方法?
- 是有一种更好的通用方法来处理变量名称中的空格(和其他字符)问题,当我希望最终情节包含它们时?我想我可以手动将它们覆盖,但这看起来像是一个很大的麻烦。 >这是
- Is there a workaround for the problem with the facet ?
- Is there a better general approach to dealing with the problem of spaces (and other characters) in variable names when I want the final plot to include them ? I suppose I can manually overide them, but that seems like a lot of faffing around.
ggplot2
包中的一个bug,它来自于函数 as.data.frame()
在内部ggplot2函数 quoted_df
中将名称转换为语法上有效的名称。这些语法上有效的名称在原始数据框中找不到,因此出现错误。 提醒您:
语法上有效的名称由字母,数字和圆点或
下划线字符,并以一个字母或圆点开始(但点
后面不能有数字)
原因。 ggplot允许您使用 labs
设置标签,例如使用以下有效名称的哑元数据集也是一个原因:
< pre $
X <-data.frame(
PonOAC = rep(c('a','b','c','d'),2),
AgeGroup = rep(c(80以上,'80以下'),每个= 4),
NumberofPractices = rpois(8,70)
)
$ b
您可以在最后使用实验来使这段代码正常工作
pre > ggplot(X,aes(x = PonOAC,y = NumberofPractices,fill = AgeGroup))+
geom_bar()+
facet_grid(AgeGroup〜。)+
实验室(x =%on OAC,y =实践次数,fill =年龄组)
制作
I know it is preferred if variable names do not have spaces in them. I have a situation where I need publication-quality charts, so axes and legends need to have properly formatted labels, ie with spaces. So, for example, in development I might have variables called "Pct.On.OAC" and Age.Group, but in my final plot I need "% on OAC" and "Age Group" to appear:
'data.frame': 22 obs. of 3 variables:
$ % on OAC : Factor w/ 11 levels "0","0.1-9.9",..: 1 2 3 4 5 6 7 8 9 10 ...
$ Age Group : Factor w/ 2 levels "Aged 80 and over",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Number of Practices: int 47 5 33 98 287 543 516 222 67 14 ...
But when I try to plot these:
ggplot(dt.m, aes(x=`% on OAC`,y=`Number of Practices`, fill=`Age Group`)) +
geom_bar()
)
no problem with that. But when I add a facet:
ggplot(dt.m, aes(x=`% on OAC`,y=`Number of Practices`, fill=`Age Group`)) +
geom_bar() +
facet_grid(`Age Group`~ .)
I get Error in
[.data.frame(base, names(rows)) : undefined columns selected
If I change Age Group
to Age.Group
then it works fine, but as I said, I don't want the dot to appear in the title legend.
So my questions are:
This is a "bug" in the package ggplot2
that comes from the fact that the function as.data.frame()
in the internal ggplot2 function quoted_df
converts the names to syntactically valid names. These syntactically valid names cannot be found in the original dataframe, hence the error.
To remind you :
syntactically valid names consists of letters, numbers and the dot or underline characters, and start with a letter or the dot (but the dot cannot be followed by a number)
There's a reason for that. There's also a reason why ggplot allows you to set labels using labs
, eg using the following dummy dataset with valid names:
X <-data.frame(
PonOAC = rep(c('a','b','c','d'),2),
AgeGroup = rep(c("over 80",'under 80'),each=4),
NumberofPractices = rpois(8,70)
)
You can use labs at the end to make this code work
ggplot(X, aes(x=PonOAC,y=NumberofPractices, fill=AgeGroup)) +
geom_bar() +
facet_grid(AgeGroup~ .) +
labs(x="% on OAC", y="Number of Practices",fill = "Age Group")
To produce
这篇关于如何处理列名中的空格?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!