ggplot2 stat_function带有计算的参数,用于facet_grid中的不同数据子集 [英] ggplot2 stat_function with calculated argument for different data subset inside a facet_grid
问题描述
如何将 fitdistr
计算出的参数传递给 stat_function
(参见此处)。
我的数据框就是这样的(见下面的完整数据集):
> str(small_data)
'data.frame':1032 obs。 3个变量:
$实验:具有6个等级的因子1L,2L,3L,..:1 1 1 1 1 1 1 1 1 1 ...
$ t :num 0 0 0 0 0 0 0 0 0 0 ...
$ int:num 75.7 86.1 76.3 82.3 98.3 ...
我想绘制一个由 Exp
和 t
分组的facet_grid,密度直方图 int
,并绘制其上拟合的对数正态分布(对数正态线用t表示)。我已经尝试了以下内容:
library(MASS)
meanlog< - function(x) {fitdistr(x,lognormal)$ estimate [[1]]}
sdlog < - function(x){fitdistr(x,lognormal)$ estimate [[2]]}
p_chip <-ggplot(small_data,(aes(x = int)))+
facet_grid(Exp_t)+
stat_function(fun = dlnorm,
args = with(small_data,
c(meanlog = meanlog(int),
sdlog = sdlog(int))),
aes(color = t))+
scale_colour_gradient2(low ='red',mid =''蓝色',高='绿色',中点= 5)+
geom_histogram(aes(x = int,y = ..density ..),binwidth = 150)
,但与
, meanlog
和 sdlog
使用整个数据集计算meanlog和sdlog,如下所示(曲线在所有方面都相同)。我怎么能让它只在右边 Exp
, t
子集?
编辑:
因为大型数据集以某种方式在某些环境中复制/粘贴时产生了错误,因此这是一个较小的集合,应该更容易复制粘贴。然而,它并不直接对应于上面的图像
small_data< -data.frame(Exp = c('1L',' 1L, '1L', '1L', '1L', '1L', '1L', '1L', '1L', '1L', '1L', '1L', '1L', '1L' '1L', '1L', '1L', '1L', '1L', '1L', '1L', '1L', '1L', '1L', '1L', '1L', 1L, '1L', '1L', '1L', '1L', '1L', '1L', '1L', '1L', '1L', '1L', '1L', '1L' '1L', '1L', '1L', '1L', '1L', '1L', '1L', '1L', '1L', '1L', '1L', '1L', 1L, '1L', '1L', '1L', '1L', '2L', '2L', '2L', '2L', '2L', '2L', '2L', '2L' , '2L', '2L', '2L', '2L', '2L', '2L', '2L', '2L', '2L', '2L', '2L', '2L', 2L, '2L', '2L', '2L', '2L', '2L', '2L', '2L', '2L', '2L', '2L', '2L', '2L' , '2L', '2L', '2L', '2L', '2L', '2L', '2L', '2L', '2L') ,T = C(0,0,0,0.33,0.33,0.33,0.67,0.67,0.67,0.67,0.67,0.67,0.67,0.67,0.67,1,1,1,1,1.33,1.33,1.33,1.33 ,1.33,1.33,1.33,1.33,1.33,1.33,1.33,1.67,1.67,1.67,1.67,1.67,2,2,2,2,4,4,4,4,6,6,6,6,8 ,8,10,10,10,10,10,10,10,0,0,0,0,0.33,0.33,0.67,0.67,0.67,0.67,0.67,0.67,1,1,1,1,1.33 ,1.33,1.33,1.33,1.67,1.67,1.67,1.67,1.67,2,2,4,4,4,4,4,6,6,6,8,10,10,10,10,10,10 ),INT = C(123.059145129225,122.520943007553,119.229495472186,163.349124924562,157.235229958189,101.456442831216,111.474216664325,99.982866933181,274.938909090909,147.40293040293,310.134596211366,116.476923076923,182.25272382757,332.75885911841,186.54737080689,479.628657282935,477.898496240602,283.311517925248,567.147534189805,494.208102667338,388.615060940221,624.508012820513, 795.2320925868,549.957142857143,923.04146100691,621.26579261025,717.577954847278,511.907210538479,443.562731447193,391.730061349693,495.384824667473,430.430866037423,157.39336711193,621.531297709924,415.420508401551,440.780570409982,414.551266085513,446.503836734694,255.0596 85999741,355.922701246211,308.996825396825,200.726012503398,297.958043579045,166.873177083333,184.450355103746,558.391405073555,182.63632183908,320.197666318356,151.874083846379,314.008287813147,125.941419000172,151.284729448491,605.400970873786,143.730810479547,240.779288537549,139.011736015851,498.179183673469,498.899700037495,923.604765506808,1302.60915123996,471.794167269222,239.522509225092,534.769484464503, 566.458609271523,337.121275121275,343.216533124878,250.47206095791,585.740563784042,873.775097783572,758.63260265514,561.869607843137,817.746869756034,461.11271165024,406.232050773503,897.39966367713,756.734451942367,605.242334066503,637.310763256886,721.862398822664,898.142725315288,670.916794425087,922.623940368313,1088.8436714166,969.805583375062,986.695448585877,645.589644637402,981.861218195836,541.388875932836, 1309.12344123945,925.446478133674,629.419699499165,1589.24284959626,814.736442884637,904.710338680927,947.911413969336,1481.51339495535,1007.30852694893,563.3552411 71884))
。
stat_function(...) - 请参阅此链接 a>,特别是Hadley Wickham的评论。 你必须这么做,也就是说,计算 所以这段代码创建了一个数据帧 请注意 I have a follow up question to how to pass My data frame is like that (see below for full data set): I would like to plot a facet_grid grouped by but Edit:
Because somehow the large data set created errors in copy/paste on some environment, here is a smaller set which should be easier to copy paste. However it does not directly correspond to the image above
This is not possible using You have to do it the hard way, which is to say, calculating the function values external to So this code creates a data frame Note that 这篇关于ggplot2 stat_function带有计算的参数,用于facet_grid中的不同数据子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
ggplot外部的函数值
。幸运的是,这并不难。
library(MASS)
library(ggplot2)
df < - 具有(fitdistr(z,对数正态),c(估计[1],估计[2]))的集合(int_Exp + t,small_data,
函数(z)))
(small_data,seq(min(int),max(int),len = 100))(其中df <-data.frame(df [,1:2],df [,3])) (gg,dlnorm(x,meanlog,sdlog))的数据块(b = )
ggplot(small_data,(aes(x = int)))+
geom_histogram(aes(x = int,y = ..density ..),binwidth = 150,
color = grey50,fill =lightgreen)+
geom_line(data = gg,aes(x,y,color = t))+
facet_grid(Exp〜t)+
scale_colour_gradient2低='红',mid ='蓝',高='绿',中点= 5)
df
包含 meanlog
和 sdlog
Exp
和 t
。然后,我们创建一个辅助数据框架, gg
,它有一组x值覆盖您在 int
步骤100,然后复制 Exp
和 t
的每个组合,然后添加一列y-值使用 dlnorm(x,meanlog,sdlog)
。然后,我们使用 gg
作为数据集添加一个geom_line图层。
fitdistr(...)
并不总是收敛,因此您应该检查 df $中的
NA
s c $ c>。fitdistr
calculated args to stat_function
(see here for context). > str(small_data)
'data.frame': 1032 obs. of 3 variables:
$ Exp: Factor w/ 6 levels "1L","2L","3L",..: 1 1 1 1 1 1 1 1 1 1 ...
$ t : num 0 0 0 0 0 0 0 0 0 0 ...
$ int: num 75.7 86.1 76.3 82.3 98.3 ...
Exp
and t
showing the density histogram of int
as well as plot the fitted log-normal distribution on it (lognormal line colored by t). I have tried the following:library(MASS)
meanlog <- function(x) { fitdistr(x,"lognormal")$estimate[[1]] }
sdlog <- function(x) { fitdistr(x,"lognormal")$estimate[[2]] }
p_chip<- ggplot(small_data,(aes(x=int)))+
facet_grid(Exp~t)+
stat_function(fun=dlnorm,
args = with(small_data,
c(meanlog = meanlog(int),
sdlog = sdlog(int))),
aes(colour=t))+
scale_colour_gradient2(low='red',mid='blue',high='green',midpoint=5)+
geom_histogram(aes(x=int,y = ..density..),binwidth =150)
with
, meanlog
and sdlog
use the whole dataset to compute meanlog and sdlog as shown below (the curve is the same on all facet). How can I have it do the fitting only on the right Exp
,t
subset?small_data<-data.frame(Exp=c('1L','1L','1L','1L','1L','1L','1L','1L','1L','1L','1L','1L','1L','1L','1L','1L','1L','1L','1L','1L','1L','1L','1L','1L','1L','1L','1L','1L','1L','1L','1L','1L','1L','1L','1L','1L','1L','1L','1L','1L','1L','1L','1L','1L','1L','1L','1L','1L','1L','1L','1L','1L','1L','1L','1L','1L','2L','2L','2L','2L','2L','2L','2L','2L','2L','2L','2L','2L','2L','2L','2L','2L','2L','2L','2L','2L','2L','2L','2L','2L','2L','2L','2L','2L','2L','2L','2L','2L','2L','2L','2L','2L','2L','2L','2L','2L','2L','2L'),t=c(0,0,0,0.33,0.33,0.33,0.67,0.67,0.67,0.67,0.67,0.67,0.67,0.67,0.67,1,1,1,1,1.33,1.33,1.33,1.33,1.33,1.33,1.33,1.33,1.33,1.33,1.33,1.67,1.67,1.67,1.67,1.67,2,2,2,2,4,4,4,4,6,6,6,6,8,8,10,10,10,10,10,10,10,0,0,0,0,0.33,0.33,0.67,0.67,0.67,0.67,0.67,0.67,1,1,1,1,1.33,1.33,1.33,1.33,1.67,1.67,1.67,1.67,1.67,2,2,4,4,4,4,4,6,6,6,8,10,10,10,10,10,10),int=c(123.059145129225,122.520943007553,119.229495472186,163.349124924562,157.235229958189,101.456442831216,111.474216664325,99.982866933181,274.938909090909,147.40293040293,310.134596211366,116.476923076923,182.25272382757,332.75885911841,186.54737080689,479.628657282935,477.898496240602,283.311517925248,567.147534189805,494.208102667338,388.615060940221,624.508012820513,795.2320925868,549.957142857143,923.04146100691,621.26579261025,717.577954847278,511.907210538479,443.562731447193,391.730061349693,495.384824667473,430.430866037423,157.39336711193,621.531297709924,415.420508401551,440.780570409982,414.551266085513,446.503836734694,255.059685999741,355.922701246211,308.996825396825,200.726012503398,297.958043579045,166.873177083333,184.450355103746,558.391405073555,182.63632183908,320.197666318356,151.874083846379,314.008287813147,125.941419000172,151.284729448491,605.400970873786,143.730810479547,240.779288537549,139.011736015851,498.179183673469,498.899700037495,923.604765506808,1302.60915123996,471.794167269222,239.522509225092,534.769484464503,566.458609271523,337.121275121275,343.216533124878,250.47206095791,585.740563784042,873.775097783572,758.63260265514,561.869607843137,817.746869756034,461.11271165024,406.232050773503,897.39966367713,756.734451942367,605.242334066503,637.310763256886,721.862398822664,898.142725315288,670.916794425087,922.623940368313,1088.8436714166,969.805583375062,986.695448585877,645.589644637402,981.861218195836,541.388875932836,1309.12344123945,925.446478133674,629.419699499165,1589.24284959626,814.736442884637,904.710338680927,947.911413969336,1481.51339495535,1007.30852694893,563.355241171884))
.
stat_function(...)
- see this link, especially Hadley Wickham's comments.ggplot
. Fortunately, this is not all that difficult.library(MASS)
library(ggplot2)
df <- aggregate(int~Exp+t,small_data,
function(z)with(fitdistr(z,"lognormal"),c(estimate[1],estimate[2])))
df <- data.frame(df[,1:2],df[,3])
x <- with(small_data,seq(min(int),max(int),len=100))
gg <- data.frame(x=rep(x,each=nrow(df)),df)
gg$y <- with(gg,dlnorm(x,meanlog,sdlog))
ggplot(small_data,(aes(x=int)))+
geom_histogram(aes(x=int,y = ..density..),binwidth =150,
color="grey50",fill="lightgreen")+
geom_line(data=gg, aes(x,y,color=t))+
facet_grid(Exp~t)+
scale_colour_gradient2(low='red',mid='blue',high='green',midpoint=5)
df
containing meanlog
and sdlog
for every combination of Exp
and t
. Then we create an "auxillary data frame", gg
, which has a set of x-values covering your range in int
with 100 steps, and replicate that for every combination of Exp
and t
, and we add a column of y-values using dlnorm(x,meanlog,sdlog)
. Then we add a geom_line layer to the plot using gg
as the dataset.fitdistr(...)
does not always converge, so you should check for NA
s in df
.