计算每行条件R的实例数 [英] Counting number of instances of a condition per row R

查看:146
本文介绍了计算每行条件R的实例数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大文件,第一列是ID,其余1304列是如下基因型.

I have a large file with the first column being IDs, and the remaining 1304 columns being genotypes like below.

rsID    sample1    sample2    sample3...sample1304
abcd    aa         bb         nc        nc
efgh    nc         nc         nc        nc 
ijkl    aa         ab         aa        nc 

我想计算每行"nc"值的数量并将其结果输出到另一列,以便得到以下信息:

I would like to count the number of "nc" values per row and output the result of that to another column so that I get the following:

rsID    sample1    sample2    sample3...sample1304    no_calls
abcd    aa         bb         nc        nc            2
efgh    nc         nc         nc        nc            4
ijkl    aa         ab         aa        nc            1

表函数计算每列而不是行的频率,如果我转置要在表函数中使用的数据,则我需要文件看起来像这样:

The table function counts frequencies per column, not row and if I transpose the data to use in the table function, I would need the file to look like this:

abcd         aa[sample1]
abcd         bb[sample2]
abcd         nc[sample3] ...
abcd         nc[sample1304]
efgh         nc[sample1]
efgh         nc[sample2]
efgh         nc[sample3] ...
efgh         nc[sample1304]

使用这种格式,我将得到以下内容:

With this format, I would get the following which is what I want:

ID    nc   aa   ab   bb
abcd  2    1    0    1
efgh  4    0    0    0

有人对通过行获取频率的简单方法有任何想法吗?我现在正在尝试此操作,但是要花很多时间才能运行:

Does anybody have any idea of an simple way to get frequencies by row? I am trying this right now, but it is taking quite some time to run:

rsids$Number_of_no_calls <- apply(rsids, 1, function(x) sum(x=="NC"))

推荐答案

您可以使用rowSums.

df$no_calls <- rowSums(df == "nc")
df
#  rsID sample1 sample2 sample3 sample1304 no_calls
#1 abcd      aa      bb      nc         nc        2
#2 efgh      nc      nc      nc         nc        4
#3 ijkl      aa      ab      aa         nc        1

或者,正如MrFlick所指出的那样,要从行总和中排除第一列,您可以稍作修改

Or, as pointed out by MrFlick, to exclude the first column from the row sums, you can slightly modify the approach to

df$no_calls <- rowSums(df[-1] == "nc")


关于行名:rowSums中不计入行名,您可以进行简单的测试来演示它:


Regarding the row names: They are not counted in rowSums and you can make a simple test to demonstrate it:

rownames(df)[1] <- "nc"  # name first row "nc"
rowSums(df == "nc")      # compute the row sums
#nc  2  3             
# 2  4  1        # still the same in first row

这篇关于计算每行条件R的实例数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆