使用特定模式选择列,然后求和与比率 [英] Selecting columns using specific patterns then finding sum and ratio

查看:88
本文介绍了使用特定模式选择列,然后求和与比率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从下面的数据计算总和和比率值. (实际数据包含超过200,000列和45000行(行)).

I want to calculate the sum and ratio values from data below. (The actual data contains more than 200,000 columns and 45000 rows (lines)).

为清楚起见,我仅给出了简单的数据格式.

For clarity purpose I have given only simple data format.

#Frame  BMR_42@O22  BMR_49@O13  BMR_59@O13  BMR_23@O26  BMR_10@O13  BMR_61@O26  BMR_23@O25 
 1      1           1           0           1           1           1           1
 2      0           1           0           0           1           1           0
 3      1           1           1           0           0           1           1
 4      1           1           0           0           1           0           1
 5      0           0           0           0           0           0           0
 6      1           0           1           1           0           1           0
 7      1           1           1           1           0           0           0
 8      1           1           1           0           0           0           0
 9      1           1           1           1           1           1           1
10      0           0           0           0           0           0           0

需要根据特定条件选择列.

The columns need to be selected with certain criteria.

我认为的列数据仅是带有" @ O13 "的列.在下面,我给出了上面示例中选择的列.

The column data which I consider is columns with "@O13" only. Below I have given the selected columns from above example.

BMR_49@O13  BMR_59@O13  BMR_10@O13  
1           0           1       
1           0           1       
1           1           0       
1           0           1       
0           0           0       
0           1           0       
1           1           0       
1           1           0       
1           1           1       
0           0           0   

从所选列中,我要计算:

From the selected column, I want to calculate:

1)所有"1"的总和.在此示例中,我们得到的值为16.

1) the sum of all the "1"s. In this example we get value 16.

2)包含出现"1"(至少一次)的总行数.在上面的示例中,有8行包含至少一个出现的"1".

2) the number of total rows containing occurrence of "1" (at least once). From above example there are 8 rows which contain at least one occurrence of "1".

最后,

3)所有"1"的总数与出现"1"的总行数之比.

3) the ratio of total of all "1"s with total lines with occurrence of "1"s.

即:: :(所有"1"的总数)/(出现"1"的总行数). 示例16/8

That is :: (total of all "1"s)/(total rows with the occurance of "1"). Example 16/8

首先,我尝试使用此命令仅选择带有" @ O13 "的列

As a start, I tried with this command to select only the columns with "@O13"

awk '{for (i=1;i<=NF;i++) if (i~/@O13/); print ""}' $file2

尽管可以运行,但不会显示这些值.

Although this run but doesn't show up the values.

推荐答案

这应该做到:

awk 'NR==1{for (i=1;i<=NF;i++) if ($i~/@O13/) a[i];next} {f=0;for (i in a) if ($i) {s++;f++};if (f) r++} END {print "number of 1="s"\nrows with 1="r"\nratio="s/r}' file
number of 1=16
rows with 1=8
ratio=2

更具可读性:

awk '
NR==1{
    for (i=1;i<=NF;i++) 
        if ($i~/@O13/)
            a[i]
        next
    }
    {
    f=0
    for (i in a)
        if ($i=="1") {
            s++
            f++
        }
    if (f) r++
    } 
END {
    print   "number of 1="s \
            "\nrows with 1="r \
            "\nratio="s/r
    }
' file

这篇关于使用特定模式选择列,然后求和与比率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆