ggplot,y均值,无中间数据 [英] ggplot with mean on y and no intermediate data

查看:39
本文介绍了ggplot,y均值,无中间数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

需要ggplot的帮助,该ggplot绘制y轴的平均值,并返回包含点的线图以及每个点的文本标签(使用ggplot功能),并根据相应的颜色"对象参数进行颜色编码.尽可能我不希望从原始数据创建任何中间数据框来创建y均值的摘要.如代码片段所示,我尝试使用fun.y.还附有Excel图表.

样本数据

  set.seed(1)age_range = sample(c("ar2-15","ar16-29","ar30-44"),20,replace = TRUE)性别=样本(c("M","F"),20,替换= TRUE)region = sample(c("A","B","C"),20,replace = TRUE)物理=样本(c(差",平均",好"),20,替换= TRUE)高度=样本(c(4,5,6),20,替换= TRUE)调查= data.frame(年龄范围,性别,区域,身体,身高) 

我尝试过的ggplot代码

  ggplot(调查,aes(x =年龄范围,y =高度,颜色=性别))+ stat_summary(fun.y =平均值,geom ="point")+ geom_line() 

我得到的输出

我正在寻找的输出

解决方案

在@Sandy的评论之后,您也可以以类似的方式添加标签,尽管这里我使用的是软件包 ggrepel 以确保它们不会重叠(无需手动编码位置).对于位置,您可以从对 mean 的调用中读取结果,该调用通过美学上调用 .. y .. 返回为 y

  ggplot(调查,aes(x =年龄范围,y =身高,颜色=性别,组别=性别))+stat_summary(fun.y = mean,geom ="point")+stat_summary(fun.y = mean,geom ="line")+stat_summary(aes(label = round(.. y ..,2)),fun.y = mean,geom ="label_repel",segment.size = 0) 

给予

(请注意, segment.size = 0 是为了确保从该点到标签之间没有绘制额外的线.)

到目前为止,看来 ggrepel 似乎仅在一个轴上没有提供文本位移(请参见

尽管您可能想在实际用例中以一定的距离游玩,但这似乎可以实现您的基本目标.

Need help with ggplot that plots averages for y axis and returns the line plot with points and also the text labels for each points (using ggplot functionality) that are color coded as per the respective "color" object parameter. As far as possible I don't want to create any intermediate dataframe from original data to create summary for y means. I tried using fun.y as shown in the code snippet. Excel chart is also attached.

Sample data

set.seed(1)
age_range = sample(c("ar2-15", "ar16-29", "ar30-44"), 20, replace = TRUE)
gender = sample(c("M", "F"), 20, replace = TRUE)
region = sample(c("A", "B", "C"), 20, replace = TRUE)
physi = sample(c("Poor", "Average", "Good"), 20, replace = TRUE)
height = sample(c(4,5,6), 20, replace = TRUE)
survey = data.frame(age_range, gender, region,physi,height)

ggplot code I tried

ggplot(survey, aes(x=age_range, y=height, color=gender)) + stat_summary(fun.y=mean, geom = "point")+geom_line()

Output I am getting

Output I am looking for

解决方案

Following up on @Sandy's comment, you can also add the labels in a similar fashion, though here I am using the package ggrepel to make sure they don't overlap (without having to manually code the location). For the location, you can read the result from the call to mean which is returned as y by calling ..y.. in the aesthetics.

ggplot(survey, aes(x=age_range, y=height, color=gender, group = gender)) +
  stat_summary(fun.y=mean, geom = "point") +
  stat_summary(fun.y=mean, geom = "line") +
  stat_summary(aes(label = round(..y.., 2)), fun.y=mean, geom = "label_repel", segment.size = 0)

Gives

(Note that segment.size = 0 is to ensure that there is not an additional line drawn from the point to the label.)

As of now, it does not appear that ggrepel offers text displacement in only one axis (see here ), so you may have to manually position labels if you want more precision.

If you want to set the label locations manually, here is an approach that uses dplyr and the %>% pipe to avoid having to save any intermediate data.frames

The basic idea is described here. To see the result after any step, just highlight up to just before the %>% at the end of a line and run. First, group_by the x location and grouping that you want to plot. Get the average of each using summarise. The data are still group_by'd the age_range (summarise only rolls up one group at a time). So, you can determine which of the groups has a higher mean at that point by subtracting the mean. I used sign just to pull if it was positive or negative, then multiplied/divided by a facto to get the spacing I wanted (in this case, divided by ten to get spacing of 0.1). Add that adjustment to the mean to set where you want the label to land. Then, pass all of that into ggplot and proceed as you would with any other data.frame.

survey %>%
  group_by(age_range, gender) %>%
  summarise(height = mean(height)) %>%
  mutate(myAdj = sign(height - mean(height)) / 10
         , labelLoc = height + myAdj) %>%
  ungroup() %>%
  ggplot(aes(x = age_range
             , y = height
             , label = round(height, 2)
             , color = gender
             , group = gender
  )) +
  geom_point() +
  geom_line() +
  geom_label(aes(y = labelLoc)
             , show.legend = FALSE)

Gives:

Which seems to accomplish your base goals, though you may want to play around with spacing etc. for your actual use case.

这篇关于ggplot,y均值,无中间数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆