ggplot2颜色扩大受到异常值的影响 [英] ggplot2 Color Scale Over Affected by Outliers

查看:143
本文介绍了ggplot2颜色扩大受到异常值的影响的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



我的数据有一个Length变量,它基于一个范围,但通常会有几个更大的值。以下示例数据具有500到1500之间的95个值,以及50,000以上的5个值。当我想看到500到1500之间的颜色变化时,最终的颜色图例倾向于使用10k,20k,... 70k颜色变化。实际上,大约1300以上的任何应该是相同的纯色(可能是中值+/-疯狂),但我不知道在哪里定义。



我对任何ggplot解决方案都是开放的,但理想情况下较低的值将是红色,中等白色和更高蓝色(低不好)。在我自己的数据集中,date是ggplot aes()中as.POSIXct()的实际日期,但似乎不影响该示例。

样本(x = 1:10,大小= 100,替换= T)
stateabbr < - 样本(x = 1:50, (样本(x = 500:1500,size = 95,replace = T),60000,55000,70000,50000,65000)
x< ; data.frame(date = date,stateabbr = stateabbr,Length = Length)

#main plot
(g < - ggplot(data = x,aes(x = date, y = factor(stateabbr)))+
geom_point(aes(color = as.numeric(as.character(Length))),alpha = 3/4,size = 4)+
#scale_x_datetime(标签= date_format(%m /%d))+
opts(title =Date and State)+ xlab(Date)+ ylab(State))

#problem
g + scale_color_gradient2(Length,midpoint = median(x $ Length))



<



感谢您的帮助!

解决方案

这里有一个稍微棘手的选项:

 #创建一个表示不寻常值的新变量
x $ Length1< - > 1500
x $ Length1 [x $ Length <= 1500] < - NA

#main plot
#使用填充 - 棘手!
g< - ggplot ()=
geom_point(data = subset(x,Length <= 1500),
aes(x = date,y = factor(stateabbr),color = Length),size = 4)+
geom_point(data = subset(x,Length> 1500),
aes(x = date,y = factor(stateabbr),fill = Length1),size = 4)+
opts title =Date and State)+ xlab(Date)+ ylab(State)

#problem
g + scale_color_gradient2(Length,midpoint = median(x $ Length ))


因此,棘手的部分是在点上使用 fill >,为了说服ggplot制作另一个传说,你可以用不同的标签和颜色来填充比例。



还有一件事,阅读布兰登的答案。原则上将两种方法结合起来,线索,使用 cut 为它们创建一个单独的分类变量,然后使用我的技巧和 fill 比例。这样你可以指出多个离群点。


I'm having difficulty with a few outliers making the color scale useless.

My data has a Length variable that is based in a range, but will usually have a few much larger values. The below example data has 95 values between 500 and 1500, and 5 values over 50,000. The resulting color legends tend to use 10k, 20k, ... 70k for the color changes when I want to see color changes between 500 and 1500. Really, anything over around 1300 should be the same solid color (probably median +/- mad), but I don't know where to define that.

I'm open to any ggplot solution, but ideally lower values would be red, middle white, and higher blue (low is bad). In my own dataset, date is an actual date with as.POSIXct() in the ggplot aes(), but doesn't seem to affect the example.

#example data
date <- sample(x=1:10,size=100,replace=T)
stateabbr <- sample(x=1:50,size=100,replace=T)
Length <- c(sample(x=500:1500,size=95,replace=T),60000,55000,70000,50000,65000)
x <- data.frame(date=date,stateabbr=stateabbr,Length=Length)

#main plot
(g <- ggplot(data=x,aes(x=date,y=factor(stateabbr))) +
  geom_point(aes(color=as.numeric(as.character(Length))),alpha=3/4,size=4) + 
  #scale_x_datetime(labels=date_format("%m/%d")) + 
  opts(title="Date and State") + xlab("Date") + ylab("State"))

#problem
g + scale_color_gradient2("Length",midpoint=median(x$Length))

Adding trans="log" or "sqrt" doesn't quite do the trick either.

Thank you for your help!

解决方案

Here's one slightly tricky options:

#Create a new variable indicating the unusual values
x$Length1 <- "> 1500"
x$Length1[x$Length <= 1500] <- NA

#main plot
# Using fill - tricky!
g <- ggplot() +
  geom_point(data = subset(x,Length <= 1500),
             aes(x=date,y=factor(stateabbr),color=Length),size=4) + 
  geom_point(data = subset(x,Length > 1500),
             aes(x=date,y=factor(stateabbr),fill=Length1),size=4)+
  opts(title="Date and State") + xlab("Date") + ylab("State")

#problem
g + scale_color_gradient2("Length",midpoint=median(x$Length))

So the tricky part here is using fill on points, in order to convince ggplot to make another legend. You can obviously customize this with different labels and colors for the fill scale.

One more thing, reading Brandon's answer. You could in principle combine both approaches by taking the outlying values, using cut to create a separate categorical variable for them, and then use my trick with the fill scale. That way you could indicate multiple outlying groups of points.

这篇关于ggplot2颜色扩大受到异常值的影响的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆