在R中具有2个类别变量和1个连续变量的折线图 [英] line graph with 2 categorical variables and 1 continuous in R

查看:242
本文介绍了在R中具有2个类别变量和1个连续变量的折线图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一般对R和统计资料还是陌生的。我试图在ggplot2中的线形图中绘制2个类别变量(语音的一部分 pos,条件 trcond)和一个数字变量(得分 totacc)。

I'm quite new to R and statistics in general. I am trying to plot in a line graph 2 categorical variables (part of speech "pos", condition "trcond") and a numerical one (score "totacc") in ggplot2.

> df1<-df[, c("trcond", "subtitle", "pos", "totacc")]
> head(df1)
   trcond     subtitle pos totacc
7       L New Scene_16 lex  0.250
29      N New Scene_16 lex  0.500
8       L New Scene_25 lex  0.875
30      N New Scene_25 lex  0.666
9       L New Scene_29 lex  1.000
31      N New Scene_29 lex  0.833

我使用了ggplot2命令:

I have used this ggplot2 command:

>ggplot(data=summdfo, aes(x=pos, y=totacc, group=trcond, colour=trcond))
+ geom_line() + geom_point()

但这是行不通的,该图的整个地方都有彩色的点(蓝色和红色),并且连接它们的不仅仅是两条线。我想发布我得到的图表,因为我没有话要解释,但这是我的第一篇文章,我似乎无法上传图片。

But it is not working, the graph has coloured (blue and red) dots all over the place and more than just two lines linking them. I would like to post the graph I get as I lack words to explain but this is my first post and I don't seem to be able to upload pictures.

我想要获得标准的简单2线图,例如此页面中的蓝色和红色图(其中y =总帐单,x =时间(午餐,晚餐),按性别分组): http://www.cookbook-r.com/Graphs/Bar_and_line_graphs_%28ggplot2%29/

I would like to get a standard simple 2-line graph such as the blue and red ones in this page (where y=total bill, by x=time (lunch,dinner) grouped by gender): http://www.cookbook-r.com/Graphs/Bar_and_line_graphs_%28ggplot2%29/

我的数据集有可能吗?如果是这样,我的代码在做什么错?

Is this possible with my data set at all? If so, what am I doing wrong with the code?

推荐答案

在这里,我尝试根据以下示例创建的数据框您的数据。

Here I tried to create a data frame based on limited sample from your data.

df1 <- data.frame(trcond=rep(c('L', 'N'), 3), 
                  subtitle=rep('New Scene_29', 6),  # Not in use, just a dummy
                  pos=c('lex', 'lex', 'lex', 'noLex', 'noLex', 'noLex'), 
                  totacc=c(0.250, 0.5, 0.875, 0.666, 1.000, 0.833))

由于pos的trcond在此数据框中不平衡,因此该图将像这样混乱:

Because trcond by pos is not balanced in this data frame, the plot is going to be jumbled up like this:

ggplot(data=df1, aes(x=pos, y=totacc, group=trcond, color=trcond))+ 
  geom_line() + 
  geom_point()


但是,如果您应用一个汇总函数来计算每种条件的均值,则会显示正确的图:

However, if you apply a summary function which will compute means for each condition, a correct plot will appear:

ggplot(data=df1, aes(x=pos, y=totacc, group=trcond, color=trcond))+ 
  geom_line(stat='summary', fun.y='mean') + 
  geom_point(stat='summary', fun.y='mean')


再次,这试图弄清楚数据中的内容。最好的是,您在这里使用dput(head(df1,50))提供数据样本,以便为您提供更好的答案。

Again, this is trying to figure out what's in your data. The best is that you provide here a sample of your data using dput(head(df1, 50)) to give you a better answer.

这篇关于在R中具有2个类别变量和1个连续变量的折线图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆