使用 dplyr::group_by() 对每个组进行 loess 回归 [英] loess regression on each group with dplyr::group_by()
问题描述
好的,我在挥舞白旗.
我正在尝试对我的数据集计算 loess 回归.
我希望 loess 计算一组不同的点,为每个组绘制一条平滑线.
问题是 loess 计算是在逃避 dplyr::group_by
函数,所以 loess 回归是在整个数据集上计算的.
互联网搜索让我相信这是因为 dplyr::group_by
不应该以这种方式工作.
我只是不知道如何在每个组的基础上进行这项工作.
以下是我尝试失败的一些例子.
test2 <- test %>%group_by(CpG)%>%dplyr::arrange(AVGMOrder) %>%do(broom::tidy(predict(loess(Meth ~ AVGMOrder,span = .85, data=.))))>测试2# 小块:136 x 2# 组:CpG [4]CpG x<chr><dbl>1 cg01003813 0.7812 cg01003813 0.7933 cg01003813 0.8054 cg01003813 0.8165 cg01003813 0.8296 cg01003813 0.8417 cg01003813 0.8548 cg01003813 0.8669 cg01003813 0.87810 cg01003813 0.893
这个可行,但我不知道如何将结果应用于原始数据框中的列.我想要的结果是 x 列.如果我将 x 作为单独行中的一列应用,我会遇到问题,因为我之前调用了 dplyr::arrange
.
test2 <- test %>%group_by(CpG)%>%dplyr::arrange(AVGMOrder) %>%dplyr::do({预测(黄土(Meth ~ AVGMOrder,跨度= .85,数据=.))})
这个只是失败并出现以下错误.
<块引用>错误:结果 1、2、3、4 必须是数据框,而不是数字"
它仍然没有作为带有 dplyr::mutate
fems <- fems %>%group_by(CpG)%>%dplyr::arrange(AVGMOrder) %>%dplyr::mutate(Loess = predict(loess(Meth ~ AVGMOrder, span = .5, data=.)))
这是我的第一次尝试,主要是我想做的事情.问题是这个对整个数据帧而不是每个 CpG 组执行 loess 预测.
我真的被困在这里了.我在网上读到 purr 包可能会有所帮助,但我无法弄清楚.
数据如下所示:
>头(测试)X 基因 ID CpG CellLine Meth AVGMOrder neworder Group SmoothMeth1 40 XG cg25296477 iPS__HDF51IPS14_passage27_Female____165.592.1.2 0.81107210 1 1 5 0.78087672 94 XG cg01003813 iPS__HDF51IPS14_passage27_Female____165.592.1.2 0.97052120 1 1 5 0.79271303 148 XG cg13176022 iPS__HDF51IPS14_passage27_Female____165.592.1.2 0.06900448 1 1 5 0.80450804 202 XG cg26484667 iPS__HDF51IPS14_passage27_Female____165.592.1.2 0.84077890 1 1 5 0.81639975 27 XG cg25296477 iPS__HDF51IPS6_passage33_Female____157.647.1.2 0.81623880 2 2 3 0.82852596 81 XG cg01003813 iPS__HDF51IPS6_passage33_Female____157.647.1.2 0.95569240 2 2 3 0.8409501
<块引用>
独特的(test$CpG)[1] "cg25296477" "cg01003813" "cg13176022" "cg26484667"
所以,要清楚的是,我想对数据框中的每个唯一 CpG 进行 loess 回归,将生成的回归 y 轴值"应用于与原始 y 轴值(Meth)匹配的列.
我的实际数据集有几千个这样的 CpG,而不仅仅是四个.
https://docs.google.com/spreadsheets/d/1-Wluc9NDFSnOeTwgBw4n0pdPuSlMSTfUVM0GJTiEn_Y/edit?usp=sharing
您可能已经想到了这一点——但如果没有,这里有一些帮助.
基本上,您需要向 predict 函数提供您想要预测的值的 data.frame(向量也可以工作,但我没有尝试过).
所以对于你的情况:
fems <- fems %>%group_by(CpG)%>%排列(CpG,AVGMOrder)%>%变异(Loess = predict(loess(Meth ~ AVGMOrder,span = .5,data=.),data.frame(AVGMOrder = seq(min(AVGMOrder), max(AVGMOrder), 1))))
注意,loess 需要最少数量的观察才能运行(~4?我记不清了).此外,这将需要一段时间才能运行,因此请使用您的数据切片进行测试,以确保其正常工作.
Alright, I'm waving my white flag.
I'm trying to compute a loess regression on my dataset.
I want loess to compute a different set of points that plots as a smooth line for each group.
The problem is that the loess calculation is escaping the dplyr::group_by
function, so the loess regression is calculated on the whole dataset.
Internet searching leads me to believe this is because dplyr::group_by
wasn't meant to work this way.
I just can't figure out how to make this work on a per-group basis.
Here are some examples of my failed attempts.
test2 <- test %>%
group_by(CpG) %>%
dplyr::arrange(AVGMOrder) %>%
do(broom::tidy(predict(loess(Meth ~ AVGMOrder, span = .85, data=.))))
> test2
# A tibble: 136 x 2
# Groups: CpG [4]
CpG x
<chr> <dbl>
1 cg01003813 0.781
2 cg01003813 0.793
3 cg01003813 0.805
4 cg01003813 0.816
5 cg01003813 0.829
6 cg01003813 0.841
7 cg01003813 0.854
8 cg01003813 0.866
9 cg01003813 0.878
10 cg01003813 0.893
This one works, but I can't figure out how to apply the result to a column in my original dataframe. The result I want is column x. If I apply x as a column in a separate line, I run into issues because I called dplyr::arrange
earlier.
test2 <- test %>%
group_by(CpG) %>%
dplyr::arrange(AVGMOrder) %>%
dplyr::do({
predict(loess(Meth ~ AVGMOrder, span = .85, data=.))
})
This one simply fails with the following error.
"Error: Results 1, 2, 3, 4 must be data frames, not numeric"
Also it still isn't applied as a new column with dplyr::mutate
fems <- fems %>%
group_by(CpG) %>%
dplyr::arrange(AVGMOrder) %>%
dplyr::mutate(Loess = predict(loess(Meth ~ AVGMOrder, span = .5, data=.)))
This was my fist attempt and mostly resembles what I want to do. Problem is that this one performs the loess prediction on the entire dataframe and not on each CpG group.
I am really stuck here. I read online that the purr package might help, but I'm having trouble figuring it out.
data looks like this:
> head(test)
X geneID CpG CellLine Meth AVGMOrder neworder Group SmoothMeth
1 40 XG cg25296477 iPS__HDF51IPS14_passage27_Female____165.592.1.2 0.81107210 1 1 5 0.7808767
2 94 XG cg01003813 iPS__HDF51IPS14_passage27_Female____165.592.1.2 0.97052120 1 1 5 0.7927130
3 148 XG cg13176022 iPS__HDF51IPS14_passage27_Female____165.592.1.2 0.06900448 1 1 5 0.8045080
4 202 XG cg26484667 iPS__HDF51IPS14_passage27_Female____165.592.1.2 0.84077890 1 1 5 0.8163997
5 27 XG cg25296477 iPS__HDF51IPS6_passage33_Female____157.647.1.2 0.81623880 2 2 3 0.8285259
6 81 XG cg01003813 iPS__HDF51IPS6_passage33_Female____157.647.1.2 0.95569240 2 2 3 0.8409501
unique(test$CpG) [1] "cg25296477" "cg01003813" "cg13176022" "cg26484667"
So, to be clear, I want to do a loess regression on each unique CpG in my dataframe, apply the resulting "regressed y axis values" to a column matching the original y axis values (Meth).
My actual dataset has a few thousand of those CpG's, not just the four.
https://docs.google.com/spreadsheets/d/1-Wluc9NDFSnOeTwgBw4n0pdPuSlMSTfUVM0GJTiEn_Y/edit?usp=sharing
You may have already figured this out -- but if not, here's some help.
Basically, you need to feed the predict function a data.frame (a vector may work too but I didn't try it) of the values you want to predict at.
So for your case:
fems <- fems %>%
group_by(CpG) %>%
arrange(CpG, AVGMOrder) %>%
mutate(Loess = predict(loess(Meth ~ AVGMOrder, span = .5, data=.),
data.frame(AVGMOrder = seq(min(AVGMOrder), max(AVGMOrder), 1))))
Note, loess requires a minimum number of observations to run (~4? I can't remember precisely). Also, this will take a while to run so test with a slice of your data to make sure it's working properly.
这篇关于使用 dplyr::group_by() 对每个组进行 loess 回归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!