为什么ggplot绘制零点百分比数据点? [英] Why is ggplot graphing null percentage data points?

查看:243
本文介绍了为什么ggplot绘制零点百分比数据点?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我创建了一个测试数据集来重现此问题:

 日期百分比
2012-01 3.00 %
2012-02 43.00%
2012-03 54.00%
2012-04 43.00%
2012-05 43.00%
2012-06 23.00%
2012-07 12.00%
2012-08
2012-09
2012-10
2012-11
2012-12

这些百分比是通过在csv文件中输入十进制值并转换 Percent 列通过Microsoft Excel的百分比。



当我试图用 ggplot

$来绘制这个数据集时b
$ b

  data <-read.csv(GCdataViz / test2.csv)
p < - ggplot(data,aes(x =日期,y =百分比,group = 1))+
geom_point(size = 3)
p

我得到这张图





正如你所看到的那样,空值被绘制,Y轴也是o dd ... 3%数据点绘制在23%以上。看起来 ggplot 在标准化百分比轴方面做得并不好。有没有一种方法可以为Y轴设置正确的范围,假设我不知道百分比值(假设我抽象为实际数据集,而不是百分比列)。

>解决方案

Percent 是一个因素。默认情况下,因子标签是按字母顺序排列的。因此, 3.00%出现在 12.00%之后。如果您将 Percent 的值转换为数字值,它将起作用:



数据:

 数据<  -  read.table(text =日期百分比
2012-01 3.00%
2012-02 43.00%
2012-03 54.00%
2012-04 43.00%
2012-05 43.00%
2012-06 23.00%
2012-07 12.00%
2012 -08
2012-09
2012-10
2012-11
2012-12,header = TRUE,fill = TRUE)

使用数字值创建一个新变量, Percent2

  data<  -  transform(data,
Percent2 = replace(as.numeric(gsub(%,,Percent)),
Percent ==,0))

#日期百分比Percent2
#1 2012-01 3.00%3
#2 2012-02 43.00%43
#3 2012-03 54.00%54
#4 2012-04 43.00%43
#5 2012-05 43.00%43
#6 2012-06 23.00%23
#7 2012-07 12.00% 12
#8 2012-08 0
#9 2012-09 0
#10 2012-10 0
#11 2012-11 0
#12 2012-12 0

剧情:

  library(ggplot2)
ggplot(data,aes(x = Date,y = Percent2))+
geom_point(size = 3)


I've created a test data set to reproduce this problem:

Date    Percent
2012-01 3.00%
2012-02 43.00%
2012-03 54.00%
2012-04 43.00%
2012-05 43.00%
2012-06 23.00%
2012-07 12.00%
2012-08 
2012-09 
2012-10 
2012-11 
2012-12 

These percentages were created by inputting decimal values in a csv file and converting the format of the Percent column into Percentage via Microsoft Excel.

When I try to graph this dataset with ggplot

data <- read.csv("GCdataViz/test2.csv")
p <- ggplot(data, aes(x=Date, y=Percent, group=1)) + 
  geom_point(size = 3) 
p

I get this graph

As you can see the null values are plotted, and the Y axis is also odd... The 3% datapoint is plotted above the 23%. It seems ggplot doesn't do too well with standardizing axes with percentages. is there a way I can set the correct range for the Y axis assuming I DO NOT KNOW the percentage values (assuming I am abstracted to the actual dataset other than it is a Percent column).

解决方案

The column Percent is a factor. By default, factor labels are orderer alphabetically. Hence, 3.00% comes after 12.00%. It will work if you transform the values of Percent to numeric values:

The data:

data <- read.table(text = "Date    Percent
2012-01 3.00%
2012-02 43.00%
2012-03 54.00%
2012-04 43.00%
2012-05 43.00%
2012-06 23.00%
2012-07 12.00%
2012-08 
2012-09 
2012-10 
2012-11 
2012-12 ", header = TRUE, fill = TRUE)

Create a new variable, Percent2, with numeric values:

data <- transform(data,
                  Percent2 = replace(as.numeric(gsub("%", "", Percent)),
                                     Percent == "", 0))

#       Date Percent Percent2
# 1  2012-01   3.00%        3
# 2  2012-02  43.00%       43
# 3  2012-03  54.00%       54
# 4  2012-04  43.00%       43
# 5  2012-05  43.00%       43
# 6  2012-06  23.00%       23
# 7  2012-07  12.00%       12
# 8  2012-08                0
# 9  2012-09                0
# 10 2012-10                0
# 11 2012-11                0
# 12 2012-12                0

Plot:

library(ggplot2)
ggplot(data, aes(x = Date, y = Percent2)) + 
  geom_point(size = 3) 

这篇关于为什么ggplot绘制零点百分比数据点?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆