在 R 中对数据集进行子集化 [英] subsetting a dataset in R

查看：48 发布时间：2021/7/14 20:09:50 r split subset sapply

本文介绍了在 R 中对数据集进行子集化的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个基于计数总和过滤数据集的问题

I have a question filtering a dataset based on sum of counts

我的文件如下所示:

第一列是基因名称.我想从第三列计算与每个基因相关的总和，对于 g1，它是 6，对于 g2，它是 16，依此类推.然后条件是如果每个基因的总和 > 10 然后过滤上面的输入数据集，使我的输出看起来像

First column is gene names. I want to calculate from the third column, the sum associated with each gene, for g1 it's 6 for g2 it's 16 and so on. Then the condition is if the sum of each gene is > 10 then filter the above input dataset such that my output looks like

这是我迄今为止尝试过的:

this is what I have tried so far:

tab <- read.data("input.txt",header=FALSE)
genelist <- split(tab,tab[,1])

我如何总结并过滤掉它 > 10.我想我必须使用 sapply 来循环它，但我被困在这里.任何帮助表示赞赏

How can i sum it and filter it out > 10. I think I have to use sapply to loop it through but i am stuck here. Any help is appreciated

推荐答案

这是您要找的吗?

n_vars <- 40
gene <- sample(x=c("g1","g2","g3","g4"),size=n_vars,replace = TRUE)
v1 <- sample(x=c("a","b","c","d","e","f","g"),size=n_vars,replace = TRUE)
result <- rnorm(n=n_vars,mean=0,sd=10)

df <- data.frame(gene,v1,result) %>% 
  arrange(gene,v1) %>% 
  group_by(gene,v1) %>% 
  summarise(total=sum(result)) %>% 
  filter(total>10)

这篇关于在 R 中对数据集进行子集化的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在 R 中对数据集进行子集化 [英] subsetting a dataset in R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在 R 中对数据集进行子集化 [英] subsetting a dataset in R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭