如何在不同的bin中切割一个数字，并用新的bin来扩展数据框？ [英] How to cut a number in different bin and expand the data frame with the new bins?

查看：134 发布时间：2017/3/24 0:05:07 r data-binding cut

本文介绍了如何在不同的bin中切割一个数字，并用新的bin来扩展数据框？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想计算一些非常简单的东西，但是我没有找到解决方案。我想减少一些数量的箱子，但我想保存箱子。

  bin.size = 100 
 df = data.frame（x = c（300,400），$ b $ = c（sca1，sca2））
 cut（df $ x，seq（0，400，bin.size），
 include.lowest = TRUE）

给我

  [1 ]（200,300）（300,400）
等级：[0,100]（100,200]（200,300）（300,400）

但是我想要这样的东西：

  bin y 
 1（0,100] sca1 
 2（100,200] sca1 
 3（200,300）sca1 
 4（0,100）sca2 
 4（0,100] sca2 
 5（100,200] sca2 
 6（200,300）sca2 
 7（300,400） sca2

我想这样做，因为我想计算输入的数值为100例如：

  df2 = data.frame（snp = c（1,2,10,100,1,2,14， 16.399）
 sca = c（sca1，sca1，sca1，sca1，sca2，sca2，sca2，sca2 $ b df2 
 snp sca 
 1 1 sca1 
 2 2 sca1 
 3 10 sca1 
 4 100 sca1 
 5 1 sca2 
 6 2 sca2 
 7 14 sca2 
 8 16 sca2 
 9 399 sca2

snp可能是向量sca1中的位置。

最终目标是获得这样的东西：

  bin y 
 1（0,100] sca1 4 
 2（100,200] sca1 0 
 3（200,300）sca1 0 
 4（0,100] sca2 4 
 5（100,200] sca2 0 
 6（200,300）sca2 0 
 7（300,400）sca2 1

我可以做的最好的是：

  df2 $ cat = cut（df2 $ snp，seq（0，400，bin 
 include.lowest = TRUE）
 df2 
 snp sca cat 
 1 1 sca1 [0,100] 
 2 2 sca1 [0,100] 
 3 10 sca1 [0,100] 
 4 100 sca1 [0,100] 
 5 1 sca2 [0,100] 
 6 2 sca2 [0,100] 
 7 14 sca2 [0,100] 
 8 16 sca2 [0,100] 
 9 399 sca2（300,400）

或者这个： p>

 表（df2 $ cat，df2 $ sca）
 sca1 sca2 
 [0,100] 4 4 
（100,200] 0 0 
（200,300）0 0 
（300,400）0 1

但是最后一次尝试的问题是，（300,400）不适用于 sca1 ，因为它不存在。应该是 NA 或不显示。如何解决这个问题？

解决方案

以下是使用 tidyverse / p>

 库（dplyr）
库（tidyr）
库（purrr）
 
 df％>％
 left_join（nest（df2，snp，.key =snp），by = c（y=sca））％>％
 mutate b $ b cut = map（x，〜seq（0，...，by = 100）），
 tbls = pmap（
 .l = list（snp，cutting），
 .f = function（xx，break）{
z < -  table（cut（xx $ snp，breaks））
 data_frame（cut = names（z），count = z）
} 
）
）％>％
 select（y，tbls）％>％
 unnest（）
＃y cut count 
＃1 sca1（0,100 ] 4 
＃2 sca1（100,200] 0 
＃3 sca1（200,300）0 
＃4 sca2（0,100）4 
＃5 sca2（100,200）0 
 ＃6 sca2（200,300）0 
＃7 sca2（300,400）1

I would like to compute something really simple, but I don't find the solution. I want to cut in bins certain numbers, but I want to save the bins.

bin.size = 100 
df = data.frame(x =c(300,400), 
                y = c("sca1","sca2"))
cut(df$x, seq(0, 400, bin.size), 
    include.lowest = TRUE)

Gives me

[1] (200,300] (300,400]
Levels: [0,100] (100,200] (200,300] (300,400]

But what I want something like this:

        bin    y
1   (0,100] sca1
2 (100,200] sca1
3 (200,300] sca1
4   (0,100] sca2
5 (100,200] sca2
6 (200,300] sca2
7 (300,400] sca2

I want to do this because I want to calculate the number of values that enter in bins of 100. For example:

df2 = data.frame(snp = c(1,2,10,100,1,2,14,16,399), 
                 sca = c("sca1","sca1","sca1","sca1","sca2","sca2","sca2","sca2","sca2"))
df2
  snp  sca
1   1 sca1
2   2 sca1
3  10 sca1
4 100 sca1
5   1 sca2
6   2 sca2
7  14 sca2
8  16 sca2
9 399 sca2

snp could be the the position in a vector sca1.

The end goal is to obtain something like this:

        bin    y num
1   (0,100] sca1   4
2 (100,200] sca1   0
3 (200,300] sca1   0
4   (0,100] sca2   4
5 (100,200] sca2   0
6 (200,300] sca2   0
7 (300,400] sca2   1

The best I can do is this:

df2$cat = cut(df2$snp, seq(0, 400, bin.size), 
include.lowest = TRUE)
df2
  snp  sca       cat
1   1 sca1   [0,100]
2   2 sca1   [0,100]
3  10 sca1   [0,100]
4 100 sca1   [0,100]
5   1 sca2   [0,100]
6   2 sca2   [0,100]
7  14 sca2   [0,100]
8  16 sca2   [0,100]
9 399 sca2 (300,400]

Or this:

table(df2$cat,df2$sca)
            sca1 sca2
  [0,100]      4    4
  (100,200]    0    0
  (200,300]    0    0
  (300,400]    0    1

But the problem with this last attempt is that the category (300,400] doesn't make sense for sca1 because it doesn't exist. It should be NA or not appearing. How to solve this?

解决方案

Here's one way using a few packages from the tidyverse:

library(dplyr)
library(tidyr)
library(purrr)

df %>%
  left_join(nest(df2, snp, .key = "snp"), by = c("y" = "sca")) %>%
  mutate(
    cuts = map(x, ~ seq(0, ., by = 100)),
    tbls = pmap(
      .l = list(snp, cuts),
      .f = function(xx, breaks) {
        z <- table(cut(xx$snp, breaks))
        data_frame(cut = names(z), count = z)
      }
    )
  ) %>%
  select(y, tbls) %>%
  unnest()
#      y       cut count
# 1 sca1   (0,100]     4
# 2 sca1 (100,200]     0
# 3 sca1 (200,300]     0
# 4 sca2   (0,100]     4
# 5 sca2 (100,200]     0
# 6 sca2 (200,300]     0
# 7 sca2 (300,400]     1

这篇关于如何在不同的bin中切割一个数字，并用新的bin来扩展数据框？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在不同的bin中切割一个数字，并用新的bin来扩展数据框？ [英] How to cut a number in different bin and expand the data frame with the new bins?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何在不同的bin中切割一个数字，并用新的bin来扩展数据框？ [英] How to cut a number in different bin and expand the data frame with the new bins?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭