为什么ggplot绘制零点百分比数据点? [英] Why is ggplot graphing null percentage data points?
问题描述
我创建了一个测试数据集来重现此问题:
日期百分比
2012-01 3.00 %
2012-02 43.00%
2012-03 54.00%
2012-04 43.00%
2012-05 43.00%
2012-06 23.00%
2012-07 12.00%
2012-08
2012-09
2012-10
2012-11
2012-12
这些百分比是通过在csv文件中输入十进制值并转换 Percent
列通过Microsoft Excel的百分比。
当我试图用 ggplot
$ b
data <-read.csv(GCdataViz / test2.csv)
p < - ggplot(data,aes(x =日期,y =百分比,group = 1))+
geom_point(size = 3)
p
我得到这张图
正如你所看到的那样,空值被绘制,Y轴也是o dd ... 3%数据点绘制在23%以上。看起来 ggplot
在标准化百分比轴方面做得并不好。有没有一种方法可以为Y轴设置正确的范围,假设我不知道百分比值(假设我抽象为实际数据集,而不是百分比列)。
Percent
是一个因素。默认情况下,因子标签是按字母顺序排列的。因此, 3.00%
出现在 12.00%
之后。如果您将 Percent
的值转换为数字值,它将起作用:
数据:
数据< - read.table(text =日期百分比
2012-01 3.00%
2012-02 43.00%
2012-03 54.00%
2012-04 43.00%
2012-05 43.00%
2012-06 23.00%
2012-07 12.00%
2012 -08
2012-09
2012-10
2012-11
2012-12,header = TRUE,fill = TRUE)
使用数字值创建一个新变量, Percent2
data< - transform(data,
Percent2 = replace(as.numeric(gsub(%,,Percent)),
Percent ==,0))
#日期百分比Percent2
#1 2012-01 3.00%3
#2 2012-02 43.00%43
#3 2012-03 54.00%54
#4 2012-04 43.00%43
#5 2012-05 43.00%43
#6 2012-06 23.00%23
#7 2012-07 12.00% 12
#8 2012-08 0
#9 2012-09 0
#10 2012-10 0
#11 2012-11 0
#12 2012-12 0
剧情:
library(ggplot2)
ggplot(data,aes(x = Date,y = Percent2))+
geom_point(size = 3)
I've created a test data set to reproduce this problem:
Date Percent
2012-01 3.00%
2012-02 43.00%
2012-03 54.00%
2012-04 43.00%
2012-05 43.00%
2012-06 23.00%
2012-07 12.00%
2012-08
2012-09
2012-10
2012-11
2012-12
These percentages were created by inputting decimal values in a csv file and converting the format of the Percent
column into Percentage via Microsoft Excel.
When I try to graph this dataset with ggplot
data <- read.csv("GCdataViz/test2.csv")
p <- ggplot(data, aes(x=Date, y=Percent, group=1)) +
geom_point(size = 3)
p
I get this graph
As you can see the null values are plotted, and the Y axis is also odd... The 3% datapoint is plotted above the 23%. It seems ggplot
doesn't do too well with standardizing axes with percentages. is there a way I can set the correct range for the Y axis assuming I DO NOT KNOW the percentage values (assuming I am abstracted to the actual dataset other than it is a Percent column).
The column Percent
is a factor. By default, factor labels are orderer alphabetically. Hence, 3.00%
comes after 12.00%
. It will work if you transform the values of Percent
to numeric values:
The data:
data <- read.table(text = "Date Percent
2012-01 3.00%
2012-02 43.00%
2012-03 54.00%
2012-04 43.00%
2012-05 43.00%
2012-06 23.00%
2012-07 12.00%
2012-08
2012-09
2012-10
2012-11
2012-12 ", header = TRUE, fill = TRUE)
Create a new variable, Percent2
, with numeric values:
data <- transform(data,
Percent2 = replace(as.numeric(gsub("%", "", Percent)),
Percent == "", 0))
# Date Percent Percent2
# 1 2012-01 3.00% 3
# 2 2012-02 43.00% 43
# 3 2012-03 54.00% 54
# 4 2012-04 43.00% 43
# 5 2012-05 43.00% 43
# 6 2012-06 23.00% 23
# 7 2012-07 12.00% 12
# 8 2012-08 0
# 9 2012-09 0
# 10 2012-10 0
# 11 2012-11 0
# 12 2012-12 0
Plot:
library(ggplot2)
ggplot(data, aes(x = Date, y = Percent2)) +
geom_point(size = 3)
这篇关于为什么ggplot绘制零点百分比数据点?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!