使用geom_point有条件地使用ggplot2中的抖动 [英] Conditional use of jitter in ggplot2 with geom_point

查看:295
本文介绍了使用geom_point有条件地使用ggplot2中的抖动的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含12个变量的图表,分为两组。我不能使用方面,但使用颜色和形状,我能够使可视化易于理解。但是,有些重叠(部分或全部)。我使用抖动来处理这些问题,但正如您从附图中看到的那样,这会导致所有点被移动,而不仅仅是那些有重叠的点。



是有一种方法可以有条件地使用抖动或闪避?更妙的是,有没有办法将部分重叠的点并排放置?正如你所看到的,我的X轴是离散的类别,稍微向左/右移动并不重要。我试着用 binaxis ='y'使用dotplot,但是这完全破坏了x轴。



编辑: 此图已成功完成我正在寻找的内容。



进一步编辑:在此可视化文件后面添加代码。



<$ (17.2%),物理学(19.6%),数学(29.4%)和 ,Pol.Sc.\
(40.4%),Psychology\\\
(69.8%))

#阻止ggplot在x轴上施加字母顺序
学科< - 因素(学科,水平=学科,有序= T)

#涉及的方面
密集型< - c(0.660,0.438,0.515,0.028,0.443)
比较<-c(0.361,0.928,0.270,0.285,0.311)
wh_adverbs <-c(0.431,0.454,0.069,0.330,0.577)
past_tense <-c(0.334, 0.229,0.668,0.566,0.838)
prese (0.980,0.408,0.432,0.009,0.966)
连词≤-c(0.928,0.207,0.162,-0.299,-0.045)
personal <-c(0.498, (0.266,0.202,0.236,0.02,0.02)
sbj_目标<-c(0.913,0.755,0.863,0.803,0.913)$ b b疑问<-c(0.266,0.22,0.236,0.02,0.02) $ b所有格<-c(0.896,0.802,0.960,0.611,0.994)
thrd_person <-c(-0.244,-0.265,-0.310,-0.008,-0.384)
名词< - c(-0.602,-0.519,-0.388,-0.244,-0.196)

df1 < - data.frame(学科,
密集副词=密集型,
比较副词=比较,
Wh-副词(WRB)= wh_adverbs,
动词:过去时=过去时,
动词:现在时= present_tense,
连词=连词,
个人代词=个人,
疑问代词=疑问句,
主观/客观代词= sbj_objective,
占有式代词=所有格,
第三人称动词= thrd_person,
名词=名词,
check.names = F)

df1.m < - melt(df1)
grp < - ifelse(df1.m $%%c('第三人称动词','名词' ),'Informational Features','Involved Features')
g < - ggplot(df1.m,aes(group = grp,disciplines,value,shape = grp,color = variable))
g< -g + geom_hline(yintercept = 0,size = 9,color =white)
g <-g + geom_smooth(method =黄土,span = 0.75,level = 0.95,alpha = 1(0.16) =dashed)
g <-g + geom_point(size = 4,alpha = I(0.7),position = position_jitter(width = 0.1,height = 0))
g <-g + scale_shape_manual (values = c(17,19))


解决方案

I我很好奇别人可能会提出什么建议,但为了获得并行效果,您可以将主要的x轴类别编码为麻木ers(10,20,... 50)根据您用于颜色的类别加上/减去一小部分(如0..10)/ 2。所以你可以得到X轴为9.6,9.8,10.0,10.2 ...,然后是20.0,20.2,20.4。这可以创建一个有组织的情节,而不是随机分配这些小数调整。



这是您的数据集的一个快速实现的想法。它将主要的x变量纪律偏移六分之一的子类变量,并使用没有抖动的x值... ...

$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ M $变量)/ ScaleFactor
xadj = xadj - mean(xadj)#将它移至以零为中心
x10 = as.numeric(M $学科)* 10
M $ x = x10 + xadj
g = ggplot(M,aes(group = grp,x,value,shape = grp,color = variable))
g + geom_point(size = 4,alpha = I(0.7))+ scale_x_discrete (休息= x10,标签=纪律)

请注意,每个类别中的值均匀分布在以相同的顺序。 (该代码不包括图中所示的所有曲线拟合等)。



变化: 量化你的y值,所以他们更多的并排绘图。

  M $ valmod = M $值 -  M $值%% 0.2 + .1 

然后使用 valmod 代替 aes()语句中查看效果。



要返回类别标签,请使用 scale_x_discrete 手动设置。此版本使用不同的 ScaleFactor 来获得更大的间距和量化的y轴:

  M = df1.m 
ScaleFactor = 3
#注意这可能只是xadj而不是添加到数据框
M $ xadj = as.numeric(M $ variable)/ ScaleFactor
M $ xadj = M $ xadj - 平均值(M $ xadj)#将其移至以零为中心
M $ x10 = as.numeric(M $学科)* 10
M $ x = M $ x10 + M $ xadj

Qfact = 0.2#分辨率来量化y值
M $ valmod = M $值 - M $值%% Qfact + Qfact / 2#clump y to给定分辨率

g = ggplot(M,aes(group = grp,x,valmod,shape = grp,color = variable))+
scale_x_discrete(limits = M $ x10,breaks = unique (M $ x10),标签=水平(M $学科))
g + geom_point(size = 3,alpha = I(0.7))


I have a graph with 12 variables divided into two groups. I can't use facets, but using colour and shape, I have been able to make the visualization easy to understand. However, there are some points that overlap (partially or wholly). I am using jitter to deal with these, but as you can see from the attached graph, this leads to all points being moved around, not just those with overlap.

Is there a way to use jitter or dodge conditionally? Even better, is there a way to put the partially overlapping points side-by-side? As you can see, my x-axis is discrete categories, and a slight shift to left/right won't matter. I tried using dotplot with binaxis='y', but that completely spoils the x-axis.

Edit: This graph has managed to do exactly what I am searching for.

Further edit: Adding the code behind this visualization.

disciplines <- c("Comp. Sc.\n(17.2%)", "Physics\n(19.6%)", "Maths\n(29.4%)", "Pol.Sc.\n(40.4%)", "Psychology\n(69.8%)")

# To stop ggplot from imposing alphabetical ordering on x-axis
disciplines <- factor(disciplines, levels=disciplines, ordered=T)

# involved aspects
intensive   <- c( 0.660,  0.438,  0.515,  0.028,  0.443)
comparative <- c( 0.361,  0.928,  0.270,  0.285,  0.311)
wh_adverbs  <- c( 0.431,  0.454,  0.069,  0.330,  0.577)
past_tense    <- c(0.334, 0.229, 0.668, 0.566, 0.838)
present_tense <- c(0.680, 0.408, 0.432, 0.009, 0.996)
conjunctions <- c( 0.928,  0.207,  0.162, -0.299, -0.045)
personal      <- c(0.498, 0.521, 0.332, 0.01, 0.01)
interrogative <- c(0.266, 0.202, 0.236, 0.02, 0.02)
sbj_objective <- c(0.913, 0.755, 0.863, 0.803, 0.913)
possessive    <- c(0.896, 0.802, 0.960, 0.611, 0.994)
thrd_person <- c(-0.244, -0.265, -0.310, -0.008, -0.384)
nouns       <- c(-0.602, -0.519, -0.388, -0.244, -0.196)

df1 <- data.frame(disciplines,
                 "Intensive Adverbs"=intensive,
                 "Comparative Adverbs"=comparative,
                 "Wh-adverbs (WRB)"=wh_adverbs,
                 "Verb: Past Tense"=past_tense,
                 "Verb: Present Tense"=present_tense,
                 "Conjunctions"=conjunctions,
                 "Personal Pronouns"=personal,
                 "Interrogative Pronouns"=interrogative,
                 "Subjective/Objective Pronouns"=sbj_objective,
                 "Possessive Pronouns"=possessive,
                 "3rd-person verbs"=thrd_person,
                 "Nouns"=nouns,
                 check.names=F)

df1.m <- melt(df1)
grp <- ifelse(df1.m$variable %in% c('3rd-person verbs','Nouns'), 'Informational Features', 'Involved Features')
g <- ggplot(df1.m, aes(group=grp, disciplines, value, shape=grp, colour=variable))
g <- g + geom_hline(yintercept=0, size=9, color="white")
g <- g + geom_smooth(method=loess, span=0.75, level=0.95, alpha=I(0.16), linetype="dashed")
g <- g + geom_point(size=4,  alpha=I(0.7), position=position_jitter(width=0.1, height=0))
g <- g + scale_shape_manual(values=c(17,19))

解决方案

I am curious what others might suggest, but to get the side-by-side effect, you could code the major x-axis categories as numbers (10, 20,..50) plus/minus a small amount like (0..10)/2 based on the categories you are using for color. So you could get the x-axis as 9.6, 9.8, 10.0, 10.2 ... and then 20.0, 20.2, 20.4. This could create an organized plot instead of assigning those fractional adjustments randomly.

Here is a quick implementation of that idea for your data-set. It offsets the main x variable disciplines by one sixth of the sub-category variable and uses that without jitter for the x value...

M = df1.m
ScaleFactor = 6
xadj = as.numeric(M$variable)/ScaleFactor
xadj = xadj - mean(xadj)   # shift it to center around zero
x10  = as.numeric(M$disciplines) * 10
M$x = x10 + xadj
g = ggplot(M, aes(group=grp, x, value, shape=grp, colour=variable)) 
g + geom_point(size=4,alpha=I(0.7)) + scale_x_discrete(breaks=x10,labels=disciplines)

Note that the values within each category occur evenly spaced across and in the same order. (This code doesn't include all the curve fitting, etc that is shown in the figure).

Variation: You can see the effect even more clearly if you "quantize" your y values, so more of them plot side by side.

M$valmod = M$value - M$value %% 0.2 + .1

Then use valmod in place of value in the aes() statement to see the effect.

To get the category labels back, manually set with scale_x_discrete. This version uses a different ScaleFactor for broader spacing and the quantized y axis:

M=df1.m
ScaleFactor = 3
# Note this could just be xadj instead of adding to data frame
M$xadj = as.numeric(M$variable)/ScaleFactor
M$xadj = M$xadj - mean(M$xadj)   # shift it to center around zero
M$x10  = as.numeric(M$disciplines) * 10
M$x = M$x10 + M$xadj

Qfact = 0.2  # resolution to quantize y values
M$valmod = M$value - M$value %% Qfact + Qfact/2  # clump y to given resolution

g = ggplot(M, aes(group=grp, x, valmod, shape=grp, colour=variable)) +
    scale_x_discrete(limits = M$x10, breaks=unique(M$x10),labels=levels(M$disciplines))
g + geom_point(size=3,alpha=I(0.7))

这篇关于使用geom_point有条件地使用ggplot2中的抖动的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆