在ggplot2中编辑,geom =“line” [英] edits in a ggplot2, geom = "line"
问题描述
我还没有解决的挑战是:1)排序图中的线条,以便按照评估日期对患者线进行排序,2)用变量openCase对线条着色,最后,3)我想要删除放电点(蓝色方块) 2014年的案例(或其他随机截止日期)。
任何帮助都将被赞赏?
以下是我的示例数据,
library(ggplot2)
library(plyr)
df < - data.frame(
date = seq(Sys.Date(),len = 156,by =5 day)[sample(156,78)],
openCase = rep(0:1,39),
patients = factor(rep(1:26,3),labels = LETTERS)
)
df < - ddply (df,患者,mutate,visit = order(date))
df $ visit < - as.factor(df $ visit)
levels(df $ visit)< - c(评估(1),治疗(2),放电(3))
bqplot(date,patient,data = df,geom =line)+
geom_point(aes(color = visit),size = 2,shape = 0)
我知道我的示例数据并不完美,因为一些评估数据是在处理之后,一些排放数据在评估数据之前,但是我的基础数据所面临的部分挑战是混乱的。
目前看起来像
更新2012-04-30 16:30:13 PDT h3>
我的数据是从数据库传递来的,看起来像这样,
DF< - 结构(列表(日期=结构(C(15965L,15680L,16135L,15730L,
15920L,15705L,16110L,15530L,15575L,15905L,16140L,15795L,
15955L,15945L, 16205L,15675L,15525L,15830L,15625L,15725L,
15855L,15840L,15615L,15500L,15780L,15765L,15610L,15690L,
16080L,15570L,15685L,16175L,15740L ,15600L,15985L,15485L,
15605L,16115L,15535L,15755L,16145L,16040L,15970L,16000L,
16075L,15995L,16010L,15990L,15665L,15895L,15865L,16120L,
15880L,15930L,16055L,15820L,15650L,16155L,15700L,15640L,
15505L,15750L,15800L,15775L,15825L,15635L,16150L,15860L,
16100L,15475L,16050L,15785L,15495L ,15810L,15805L,15490L,
15460L,16085L),class =Date),openCase = c(0L,0L,0L,1L,
1L,1L,0L,0L,0L,1L ,1L,1L,0L,0L,0L,1L,1L,1L,0L,0L,
0L,1L,1L,1L,0L,0L,0L,1L,1L,1L,0L,0L,0L 1L,1L,1L,1L,1L,1L,1L,1L,0L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L, ,0L,0L,0L,1L,1L,1L,0L,0L,0L,1L,1L,1L,0L,0L,1L,1L,1L,0L,0L,1L, ,1L),患者=结构(c(1L,
1L,1L,2L,2L,2L,3L,3L,3L,4L,4L,4L,5L,5L,5L,6L,6L,
6L,7L,7L,7L,8L,8L,8L,9L,9L,9L,10L,10L,10L,11L,11L,
11L,12L,12L,12L,13L,13L,13L, 14L,14L,14L,15L, 15L,15L,
16L,16L,16L,17L,17L,17L,18L,18L,18L,19L,19L,19L,20L,
20L,20L,21L,21L,21L,22L, 22L,22L,23L,23L,23L,24L,24L,
24L,25L,25L,25L,26L,26L,26L),标签= c(A,B
D,E,F,G,H,I,J,K,L,M,N,O ,P,
,Q,R,S,T,U,V,W,X,Y,Z) (2L,1L,3L,3L,1L,2L,2L,3L,1L,3L,
1L,2L,2L,1L, 3L,2L,1L,3L,1L,2L,3L,3L,2L,1L,3L,
2L,1L,3L,1L,2L,1L,3L,2L,3L, 1L,2L,1L,
3L,2L,1L,2L,3L,3L,1L,2L,1L,3L,2L,2L,3L,1L,3L,
2L,1L, 2L,1L,1L,2L,3L,3L,1L,2L,2L,3L,1L,1L,
3L,2L,1L,3L,2L,2L,1L,3L) zym,xov,poi
),class =factor)),.Names = c(date,openCase,patients,
visit) ,row.names = c(NA,-78L),class =data.frame)
中访问级别的数量
和具体的标签,很可能会改变,所以我想要某种类型的代码,其中我rank
或sort
,根据我现有的数据生成新变量(visit
)。 / div>我仍然不确定我明白@ Ben的回答有什么问题,但我会尝试添加一个我自己的答案。从编辑中给出的
df
开始。
$ b 创建一个新变量访问 code>(注意大写字母V),它是根据给定日期的顺序进行的评估/处理/排放。这是@ Ben的代码,只需重新编写。
df < - ddply(df,patients,mutate,
访问=因子(等级(日期),
等级= 1:3,
标签= c(评估(1),治疗(2),出院(3) )))
我不明白这与
访问有什么关系
列中的数据;实际上,原来的访问
列在此后不再使用:
> ;表(df $ Visit,df $ visit)
zym xov poi
评估(1)16 7 3
治疗(2)3 16 7
出院(3) )7 3 16
对患者进行重新排序(再次复制Ben):
df $ patients < - reorder(df $ patients,df $ date,function(x)min(as.numeric(x)))
确定应该显示的点的子集(与Ben相同但不同的代码)
df2 < - df [!((df $ Visit ==Discharge(3))&(df $ date> as.Date (2014-01-01))),]
(df,aes(date,patients))+ $ b $在不影响图例的情况下制作线条不同颜色的方式
ggplot b geom_blank()+
geom_line(data = df [df $ openCase == 0,],color =black)+
geom_line(data = df [df $ openCase == 1,], color =red)+
geom_point(data = df2,aes(color = Visit), size = 2,shape = 0)
I have a line plot of some event at a hospital that I have been struggling with.
The challenges that I haven't solved yet are, 1) sorting the lines on the plot so that the patient-lines are sorted by Assessment-date, 2) coloring the lines by the variable 'openCase' and finally, 3) I would like to remove the Discharge-point (the blue square) for the cases that are in the year 2014 (or at some other random cut of date).
Any help would be appreciated?
Here is my sample data,
library(ggplot2) library(plyr) df <- data.frame( date = seq(Sys.Date(), len= 156, by="5 day")[sample(156, 78)], openCase = rep(0:1, 39), patients = factor(rep(1:26, 3), labels = LETTERS) ) df <- ddply(df, "patients", mutate, visit = order(date)) df$visit <- as.factor(df$visit) levels(df$visit) <- c("Assessment (1)", "Treatment (2)", "Discharge (3)") qplot(date, patients, data = df, geom = "line") + geom_point(aes(colour = visit), size = 2, shape=0)
I'm aware that my example data is not perfect as some of the assessment datas is after the treatments and some of the discharge data is before the assessments data, but that part of the challenge that my base data is messed up.
What it looks like at the moment,
Update 2012-04-30 16:30:13 PDT
My data is delivered from a database and looks something like this,
df <- structure(list(date = structure(c(15965L, 15680L, 16135L, 15730L, 15920L, 15705L, 16110L, 15530L, 15575L, 15905L, 16140L, 15795L, 15955L, 15945L, 16205L, 15675L, 15525L, 15830L, 15625L, 15725L, 15855L, 15840L, 15615L, 15500L, 15780L, 15765L, 15610L, 15690L, 16080L, 15570L, 15685L, 16175L, 15740L, 15600L, 15985L, 15485L, 15605L, 16115L, 15535L, 15755L, 16145L, 16040L, 15970L, 16000L, 16075L, 15995L, 16010L, 15990L, 15665L, 15895L, 15865L, 16120L, 15880L, 15930L, 16055L, 15820L, 15650L, 16155L, 15700L, 15640L, 15505L, 15750L, 15800L, 15775L, 15825L, 15635L, 16150L, 15860L, 16100L, 15475L, 16050L, 15785L, 15495L, 15810L, 15805L, 15490L, 15460L, 16085L), class = "Date"), openCase = c(0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L), patients = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L, 8L, 8L, 8L, 9L, 9L, 9L, 10L, 10L, 10L, 11L, 11L, 11L, 12L, 12L, 12L, 13L, 13L, 13L, 14L, 14L, 14L, 15L, 15L, 15L, 16L, 16L, 16L, 17L, 17L, 17L, 18L, 18L, 18L, 19L, 19L, 19L, 20L, 20L, 20L, 21L, 21L, 21L, 22L, 22L, 22L, 23L, 23L, 23L, 24L, 24L, 24L, 25L, 25L, 25L, 26L, 26L, 26L), .Label = c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z"), class = "factor"), visit = structure(c(2L, 1L, 3L, 3L, 1L, 2L, 2L, 3L, 1L, 3L, 1L, 2L, 2L, 1L, 3L, 2L, 1L, 3L, 1L, 2L, 3L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 1L, 2L, 1L, 3L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 1L, 3L, 2L, 1L, 2L, 3L, 3L, 1L, 2L, 1L, 3L, 2L, 2L, 3L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 1L, 2L, 3L, 3L, 1L, 2L, 2L, 3L, 1L, 1L, 3L, 2L, 1L, 3L, 2L, 2L, 1L, 3L), .Label = c("zym", "xov", "poi" ), class = "factor")), .Names = c("date", "openCase", "patients", "visit"), row.names = c(NA, -78L), class = "data.frame")
The number of levels in
visit
, and specific labeling, will most likely change so I would like some kind of code where Irank
orsort
based on my existing data instead (visit
) of generating new variables.解决方案I'm still not sure I understand what is wrong with @Ben's answer, but I'll try adding one of my own. Starting with the
df
given in the edit.Create a new variable
Visit
(note the capital V) which is Assessment/Treatment/Discharge based on the ordering of the dates given. This is @Ben's code, just re-written.df <- ddply(df, "patients", mutate, Visit = factor(rank(date), levels = 1:3, labels=c("Assessment (1)", "Treatment (2)", "Discharge (3)")))
I don't understand how this relates to the
visit
column in the data originally; in fact, the originalvisit
column is not used hereafter:> table(df$Visit, df$visit) zym xov poi Assessment (1) 16 7 3 Treatment (2) 3 16 7 Discharge (3) 7 3 16
Reorder the patients (again copying Ben):
df$patients <- reorder(df$patients,df$date,function(x) min(as.numeric(x)))
Determine the subset of points that should be shown (same idea as Ben, but different code)
df2 <- df[!((df$Visit == "Discharge (3)") & (df$date > as.Date("2014-01-01"))),]
To add something new, here is a way to make the lines different colors without impacting the legend
ggplot(df, aes(date, patients)) + geom_blank() + geom_line(data = df[df$openCase == 0,], colour = "black") + geom_line(data = df[df$openCase == 1,], colour = "red") + geom_point(data = df2, aes(colour = Visit), size = 2, shape = 0)
这篇关于在ggplot2中编辑,geom =“line”的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!