如何使用left_join和嵌套在R中计算不同类别的平均值？ [英] How to compute the mean in different categories using left_join and nest in R?

查看：169 发布时间：2017/7/13 21:40:48 r left-join dplyr tidyr

本文介绍了如何使用left_join和嵌套在R中计算不同类别的平均值？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用 left_join 和 nest 来计算收货数据的平均值。

  bin.size = 100

第一个数据框：

  df = data.frame（x = c（300,400），
y = c （sca1，sca2））
xy 
 1 300 sca1 
 2 400 sca2

第二个数据框：

  df2 = data.frame（snp = c（1,2,10,100， （sca2，sca2，sca2，sca2，sca2，sca2，sca2 sca2））
 
 snp r2 sca 
 1 1 0.70 sca1 
 2 2 0.80 sca1 
 3 10 0.70 sca1 
 4 100 0.10 sca1 
 5 1 0.90 sca2 
 6 2 0.98 sca2 
 7 14 0.80 sca2 
 8 16 0.80 sca2 
 9 399 0.01 sca2 
  pre> 
 
 来自@ r2evans的代码：
  output_bin_LD = df％> ;％
 left_join（nest（df2，snp，.key =snp），by = c（y=sca））％>％
 mutate b $ b cut = map（x，〜seq（0，。，by = bin.size）），
 tbls = pmap（
 .l = list（snp，cutting），
 .f = function（xx，break）{
z < -  table（cut（xx $ snp，breaks））
 data_frame（cut = names（z），count = z）
} 
）
）％>％
 select（y，tbls）％>％
 unnest（）
  
这个代码正在这样做：
  y cut count 
 1 sca1（0,100）4 
 2 sca1（100,200）0 
 3 sca1（200,300）0 
 4 sca2（0,100 ] 4 
 5 sca2（100,200] 0 
 6 sca2（200,300）0 
 7 sca2（300,400）1 
  
最终目标是拥有
  y cut count mean 
 1 sca1（0,100] 4 0.575 
 2 sca1（100,20 0] 0 0 
 3 sca1（200,300）0 0 
 4 sca2（0,100）4 0.87 
 5 sca2（100,200）0 0 
 6 sca2（200,300）0 0 
 7 sca2（300,400）1 399 
  
到目前为止，我已经尝试过：
  df％>％
 left_join（nest（df2，snp，r2，.key =snp），
 by = c（y=sca））％>％
 mutate（
 cutting = map（x，〜seq（0，...，by = 100）），
 tbls = pmap（
 .l = list（snp，cutting），
 .f = function（xx，break）{
z < -  table（cut（xx $ snp， ）
a < -  mean（cut（xx $ r2，break））
 data_frame（cut = names（z），count = z，mean = a）
}＃.f 
）＃关闭pmap 
）％>％＃mutate 
 select（y，tbls）％>％
 unnest（）
  / pre> 
 
 但它输出我 NA  s和一条警告消息：
  y cut count mean 
 1 sca1（0,100）4 NA 
 2 sca1 （100,200] 0 NA 
 3 sca1（200,300）0 NA 
 4 sca2（0,100）4 NA 
 5 sca2（100,200）0 NA 
 6 sca2（200,300）0 NA 
 7 sca2（300,400）1 NA 
警告消息：
 1：在mean.default（cut（xx $ r2，休息））：
参数不是数字或逻辑：返回NA 
 2：在mean.default（cut（xx $ r2，休息））：
参数不是数字或逻辑：返回NA 
  
我该如何解决这个问题？我需要双重嵌套桌子吗？ 
解决方案
不确定您的方法，但这里有一个简单的方法..使用 data.table 包，如果你有兴趣。您将需要最新版本（目前为1.10.0），因为这是一个新功能。
  require（ data.table）## v1.9.8 + 
和<  -  b [a，on =。（sca = y，snp> start，snp< = end），## 1 
。 = .N，mean = mean（r2，na.rm = TRUE）），## 2 
 by = .EACHI] ## 3 
  
 
 
 
 
  对于 a 中的每一行，请在<$ c $中找到匹配的行索引在参数 
 
  的条件下匹配c> b  > 长度（匹配行索引） ==  .N 给出计数和 mean（）给出了这些匹配索引的 r2 的平均值。
 
 
   （2）中的表达式运行在 a 中的每一行。 
 
 
 
其中， a 是：
  require（data.table）## v1.9.8 + 
a < -  setDT（df）[，。（start = seq 0，x-1，by = bin.size），
 end = seq（bin.size，x，by = bin.size）），
 by = y] 
 
b<  -  fread（snp r2 sca 
 1 0.70 sca1 
 2 0.80 sca1 
 10 0.70 sca1 
 100 0.10 sca1 
 1 0.90 sca2 
 2 0.98 sca2 
 14 0.80 sca2 
 16 0.80 sca2 
 399 0.01 sca2）
  
 
I'm trying to compute the mean values for binned data using left_join and nest.
bin.size = 100 
First dataframe:
df = data.frame(x =c(300,400), 
                y = c("sca1","sca2"))
    x    y
1 300 sca1
2 400 sca2
Second dataframe:
df2 = data.frame(snp = c(1,2,10,100,1,2,14,16,399), 
                 sca = c("sca1","sca1","sca1","sca1","sca2","sca2","sca2","sca2","sca2"))

      snp   r2  sca
1   1 0.70 sca1
2   2 0.80 sca1
3  10 0.70 sca1
4 100 0.10 sca1
5   1 0.90 sca2
6   2 0.98 sca2
7  14 0.80 sca2
8  16 0.80 sca2
9 399 0.01 sca2
Code from @r2evans:
output_bin_LD = df %>%
  left_join(nest(df2, snp, .key = "snp"), by = c("y" = "sca")) %>%
  mutate(
    cuts = map(x, ~ seq(0, ., by = bin.size)),
    tbls = pmap(
      .l = list(snp, cuts),
      .f = function(xx, breaks) {
        z <- table(cut(xx$snp, breaks))
        data_frame(cut = names(z), count = z)
      }
    )
  ) %>%
  select(y, tbls) %>%
  unnest()
This code up is doing this: 
     y       cut count
1 sca1   (0,100]     4
2 sca1 (100,200]     0
3 sca1 (200,300]     0
4 sca2   (0,100]     4
5 sca2 (100,200]     0
6 sca2 (200,300]     0
7 sca2 (300,400]     1
The end goal would be to have 
     y       cut count  mean
1 sca1   (0,100]     4 0.575
2 sca1 (100,200]     0     0
3 sca1 (200,300]     0     0
4 sca2   (0,100]     4  0.87
5 sca2 (100,200]     0     0
6 sca2 (200,300]     0     0
7 sca2 (300,400]     1   399
So far I've tried this: 
df %>%
  left_join(nest(df2, snp, r2, .key = "snp"), 
            by = c("y" = "sca")) %>%
  mutate(
    cuts = map(x, ~ seq(0, ., by = 100)),
    tbls = pmap(
      .l = list(snp, cuts),
      .f = function(xx, breaks) {
        z <- table(cut(xx$snp, breaks))
        a <- mean(cut(xx$r2, breaks))
        data_frame(cut = names(z), count = z, mean = a)
      } # .f 
    ) # closing pmap
  ) %>% # mutate
  select(y, tbls) %>%
  unnest()
But it outputs me NAs and a warning message: 
     y       cut count mean
1 sca1   (0,100]     4   NA
2 sca1 (100,200]     0   NA
3 sca1 (200,300]     0   NA
4 sca2   (0,100]     4   NA
5 sca2 (100,200]     0   NA
6 sca2 (200,300]     0   NA
7 sca2 (300,400]     1   NA
Warning messages:
1: In mean.default(cut(xx$r2, breaks)) :
  argument is not numeric or logical: returning NA
2: In mean.default(cut(xx$r2, breaks)) :
  argument is not numeric or logical: returning NA
How should I fix this? Do I need to double nest the table? 
 解决方案 
Not sure about your approach, but here's a slightly straightforward approach.. using data.table package, if you're interested. You will need the latest version (currently 1.10.0) for this to work (since it's a new feature).
require(data.table) ## v1.9.8+
and <- b[a, on=.(sca=y, snp>start, snp<=end),       ## 1
         .(count=.N, mean=mean(r2, na.rm=TRUE)),    ## 2
         by=.EACHI]                                 ## 3



For each row in a, find matching row indices in b while matching on the condition provided to on argument.
length(matching row indices) == .N gives count and mean() gives the mean of r2 for those matching indices.
The expression in (2) is run for each row in a.
where, a is:
require(data.table) ## v1.9.8+
a <- setDT(df)[, .(start=seq(0, x-1, by=bin.size), 
                   end=seq(bin.size, x, by=bin.size)), 
                 by=y]

b <- fread("snp   r2  sca
      1 0.70 sca1
      2 0.80 sca1
     10 0.70 sca1
    100 0.10 sca1
      1 0.90 sca2
      2 0.98 sca2
     14 0.80 sca2
     16 0.80 sca2
    399 0.01 sca2")


                        
这篇关于如何使用left_join和嵌套在R中计算不同类别的平均值？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用left_join和嵌套在R中计算不同类别的平均值？ [英] How to compute the mean in different categories using left_join and nest in R?

问题描述

相关文章

其他开发语言最新文章

热门教程

热门工具

登录关闭

如何使用left_join和嵌套在R中计算不同类别的平均值？ [英] How to compute the mean in different categories using left_join and nest in R?

问题描述

相关文章

其他开发语言最新文章

热门教程

热门工具

登录 关闭

登录关闭