使用 dplyr::group_by() 对每个组进行 loess 回归 [英] loess regression on each group with dplyr::group_by()

查看:14
本文介绍了使用 dplyr::group_by() 对每个组进行 loess 回归的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

好的,我在挥舞白旗.

我正在尝试对我的数据集计算 loess 回归.

我希望 loess 计算一组不同的点,为每个组绘制一条平滑线.

问题是 loess 计算是在逃避 dplyr::group_by 函数,所以 loess 回归是在整个数据集上计算的.

互联网搜索让我相信这是因为 dplyr::group_by 不应该以这种方式工作.

我只是不知道如何在每个组的基础上进行这项工作.

以下是我尝试失败的一些例子.

test2 <- test %>%group_by(CpG)%>%dplyr::arrange(AVGMOrder) %>%do(broom::tidy(predict(loess(Meth ~ AVGMOrder,span = .85, data=.))))>测试2# 小块:136 x 2# 组:CpG [4]CpG x<chr><dbl>1 cg01003813 0.7812 cg01003813 0.7933 cg01003813 0.8054 cg01003813 0.8165 cg01003813 0.8296 cg01003813 0.8417 cg01003813 0.8548 cg01003813 0.8669 cg01003813 0.87810 cg01003813 0.893

这个可行,但我不知道如何将结果应用于原始数据框中的列.我想要的结果是 x 列.如果我将 x 作为单独行中的一列应用,我会遇到问题,因为我之前调用了 dplyr::arrange.

test2 <- test %>%group_by(CpG)%>%dplyr::arrange(AVGMOrder) %>%dplyr::do({预测(黄土(Meth ~ AVGMOrder,跨度= .85,数据=.))})

这个只是失败并出现以下错误.

<块引用>

错误:结果 1、2、3、4 必须是数据框,而不是数字"

它仍然没有作为带有 dplyr::mutate

的新列应用

fems <- fems %>%group_by(CpG)%>%dplyr::arrange(AVGMOrder) %>%dplyr::mutate(Loess = predict(loess(Meth ~ AVGMOrder, span = .5, data=.)))

这是我的第一次尝试,主要是我想做的事情.问题是这个对整个数据帧而不是每个 CpG 组执行 loess 预测.

我真的被困在这里了.我在网上读到 purr 包可能会有所帮助,但我无法弄清楚.

数据如下所示:

>头(测试)X 基因 ID CpG CellLine Meth AVGMOrder neworder Group SmoothMeth1 40 XG cg25296477 iPS__HDF51IPS14_passage27_Female____165.592.1.2 0.81107210 1 1 5 0.78087672 94 XG cg01003813 iPS__HDF51IPS14_passage27_Female____165.592.1.2 0.97052120 1 1 5 0.79271303 148 XG cg13176022 iPS__HDF51IPS14_passage27_Female____165.592.1.2 0.06900448 1 1 5 0.80450804 202 XG cg26484667 iPS__HDF51IPS14_passage27_Female____165.592.1.2 0.84077890 1 1 5 0.81639975 27 XG cg25296477 iPS__HDF51IPS6_passage33_Female____157.647.1.2 0.81623880 2 2 3 0.82852596 81 XG cg01003813 iPS__HDF51IPS6_passage33_Female____157.647.1.2 0.95569240 2 2 3 0.8409501

<块引用>

独特的(test$CpG)[1] "cg25296477" "cg01003813" "cg13176022" "cg26484667"

所以,要清楚的是,我想对数据框中的每个唯一 CpG 进行 loess 回归,将生成的回归 y 轴值"应用于与原始 y 轴值(Meth)匹配的列.

我的实际数据集有几千个这样的 CpG,而不仅仅是四个.

https://docs.google.com/spreadsheets/d/1-Wluc9NDFSnOeTwgBw4n0pdPuSlMSTfUVM0GJTiEn_Y/edit?usp=sharing

解决方案

您可能已经想到了这一点——但如果没有,这里有一些帮助.

基本上,您需要向 predict 函数提供您想要预测的值的 data.frame(向量也可以工作,但我没有尝试过).

所以对于你的情况:

fems <- fems %>%group_by(CpG)%>%排列(CpG,AVGMOrder)%>%变异(Loess = predict(loess(Meth ~ AVGMOrder,span = .5,data=.),data.frame(AVGMOrder = seq(min(AVGMOrder), max(AVGMOrder), 1))))

注意,loess 需要最少数量的观察才能运行(~4?我记不清了).此外,这将需要一段时间才能运行,因此请使用您的数据切片进行测试,以确保其正常工作.

Alright, I'm waving my white flag.

I'm trying to compute a loess regression on my dataset.

I want loess to compute a different set of points that plots as a smooth line for each group.

The problem is that the loess calculation is escaping the dplyr::group_by function, so the loess regression is calculated on the whole dataset.

Internet searching leads me to believe this is because dplyr::group_by wasn't meant to work this way.

I just can't figure out how to make this work on a per-group basis.

Here are some examples of my failed attempts.

test2 <- test %>% 
  group_by(CpG) %>% 
  dplyr::arrange(AVGMOrder) %>% 
  do(broom::tidy(predict(loess(Meth ~ AVGMOrder, span = .85, data=.))))

> test2
# A tibble: 136 x 2
# Groups:   CpG [4]
   CpG            x
   <chr>      <dbl>
 1 cg01003813 0.781
 2 cg01003813 0.793
 3 cg01003813 0.805
 4 cg01003813 0.816
 5 cg01003813 0.829
 6 cg01003813 0.841
 7 cg01003813 0.854
 8 cg01003813 0.866
 9 cg01003813 0.878
10 cg01003813 0.893

This one works, but I can't figure out how to apply the result to a column in my original dataframe. The result I want is column x. If I apply x as a column in a separate line, I run into issues because I called dplyr::arrange earlier.

test2 <- test %>% 
  group_by(CpG) %>% 
  dplyr::arrange(AVGMOrder) %>% 
  dplyr::do({
    predict(loess(Meth ~ AVGMOrder, span = .85, data=.))
  })

This one simply fails with the following error.

"Error: Results 1, 2, 3, 4 must be data frames, not numeric"

Also it still isn't applied as a new column with dplyr::mutate

fems <- fems %>% 
  group_by(CpG) %>% 
  dplyr::arrange(AVGMOrder) %>% 
  dplyr::mutate(Loess = predict(loess(Meth ~ AVGMOrder, span = .5, data=.)))

This was my fist attempt and mostly resembles what I want to do. Problem is that this one performs the loess prediction on the entire dataframe and not on each CpG group.

I am really stuck here. I read online that the purr package might help, but I'm having trouble figuring it out.

data looks like this:

> head(test)
    X geneID        CpG                                        CellLine       Meth AVGMOrder neworder Group SmoothMeth
1  40     XG cg25296477 iPS__HDF51IPS14_passage27_Female____165.592.1.2 0.81107210         1        1     5  0.7808767
2  94     XG cg01003813 iPS__HDF51IPS14_passage27_Female____165.592.1.2 0.97052120         1        1     5  0.7927130
3 148     XG cg13176022 iPS__HDF51IPS14_passage27_Female____165.592.1.2 0.06900448         1        1     5  0.8045080
4 202     XG cg26484667 iPS__HDF51IPS14_passage27_Female____165.592.1.2 0.84077890         1        1     5  0.8163997
5  27     XG cg25296477  iPS__HDF51IPS6_passage33_Female____157.647.1.2 0.81623880         2        2     3  0.8285259
6  81     XG cg01003813  iPS__HDF51IPS6_passage33_Female____157.647.1.2 0.95569240         2        2     3  0.8409501

unique(test$CpG) [1] "cg25296477" "cg01003813" "cg13176022" "cg26484667"

So, to be clear, I want to do a loess regression on each unique CpG in my dataframe, apply the resulting "regressed y axis values" to a column matching the original y axis values (Meth).

My actual dataset has a few thousand of those CpG's, not just the four.

https://docs.google.com/spreadsheets/d/1-Wluc9NDFSnOeTwgBw4n0pdPuSlMSTfUVM0GJTiEn_Y/edit?usp=sharing

解决方案

You may have already figured this out -- but if not, here's some help.

Basically, you need to feed the predict function a data.frame (a vector may work too but I didn't try it) of the values you want to predict at.

So for your case:

fems <- fems %>% 
  group_by(CpG) %>% 
  arrange(CpG, AVGMOrder) %>% 
  mutate(Loess = predict(loess(Meth ~ AVGMOrder, span = .5, data=.),
    data.frame(AVGMOrder = seq(min(AVGMOrder), max(AVGMOrder), 1))))

Note, loess requires a minimum number of observations to run (~4? I can't remember precisely). Also, this will take a while to run so test with a slice of your data to make sure it's working properly.

这篇关于使用 dplyr::group_by() 对每个组进行 loess 回归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆