awk的基础上$ 2和$ 17个独立的行和做平均的$ 17日 [英] awk separate rows based on $2 and $17 and do average on $17

查看：155 发布时间：2016/7/29 11:16:39 awk

本文介绍了awk的基础上$ 2和$ 17个独立的行和做平均的$ 17日的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我们在这里有一个输入：

<$p$p><$c$c>cpdID,cpd_number,Cell_assay_id,Cell_alt_assay_id,Cell_type_desc,Cell_Operator,Cell_result_value,Cell_unit_value,assay_id,alt_assay_id,type_desc,operator,result_value,unit_value,Ratio_operator,Ratio,log_ratio,Cell_experiment_date,experiment_date,Cell_discipline,discipline
49，CPD-7788990,1212,2323，IC50 ,, 100，嗯，1334,1331，奇,, 10，嗯,, 10，-1,12 / 6/2006 0：00,2 / 16/2007 0： 00，细胞，酶
49，CPD-7788990,5555,6666，IC50，＆GT; 150，嗯，1334,1331，奇,, 10，嗯，＆GT; 15，-2,12 / 6/2006 0：00,2 / 16 / 2007 0：00，细胞，酶
49，CPD-7788990,8888,9999，IC50 ,, 200，嗯，1334,1331，奇,, 10，嗯,, 20，-3,12- / 6/2006 0：00,2 / 16/2007 0： 00，细胞，酶
49，CPD-6666666,8888,9999，IC50 ,, 400，嗯，1334,1331，奇,, 10，嗯,, 40，-1,12 / 6/2006 0：00,2 / 16/2007 0： 00，细胞，酶
49，CPD-1111,8888,9999，IC50 ,, 400，嗯，1334,1331，奇,, 10，嗯,, 40，-1,12 / 6/2006 0：00,2 / 16/2007 0： 00，细胞，酶
49，CPD-1111,8888,9999，IC50 ,, 400，嗯，1334,1331，奇,, 10，嗯,, 40，-1.1,12 / 6/2006 0：00,2 / 16/2007 0： 00，细胞，酶
49，CPD-1111,8888,9999，IC50 ,, 400，嗯，1334,1331，奇,, 10，嗯,, 40，-1.2,12 / 6/2006 0：00,2 / 16/2007 0： 00，细胞，酶
49，CPD-1111,8888,9999，IC50 ,, 400，嗯，1334,1331，奇,, 10，嗯,, 40，-1.3,12 / 6/2006 0：00,2 / 16/2007 0： 00，细胞，酶

我们希望这个input.csv分成2档

如果该$ 2是同一和最大减闵在$ 17所述; = 1，平均$ 17和把它分为文件中的

如果$ 2相同，最大负MIN，以$ 17日> 1，平均$ 17和把它放到文件B。

请注意：如果有一个独特$ 2本身，我们想保持它在这里（CPD-6666666为例）

请注意：CPD-1111（$最大为17分钟）= -1 - （ - 1.3）= 0.3＆LT; 1

a：其中（$最大为17分钟）LT = 1。新的$ 17 CPD-1111（$ 2）的平均值（-1，-1.1，-1.2，-1.3）= -1.15

B：在哪儿（$最大为17分钟）> 1。新的$ 17 CPD-7788990（$ 2）是平均（-1，-2，-3）= -2

下面是可以分开输入a和b，但还没有做平均尚未尝试。

 ＃！的/ usr /斌/的awk -fBEGIN {FS =，; F1 =一个; F2 =B}FNR == 1 {打印$ 0 GT; F1;打印$ 0 GT; F2;下一个 }$ 2 = last_id和放大器;！＆安培; FNR＆GT; 2 {handleBlock（）}{A [++ CNT] = $ 0; M [CNT] = $ 17; last_id = $ 2}END {handleBlock（）}功能handleBlock（）{如果（M [1] -m [CNT]下; = 1）FNAME = F1否则FNAME = F2为（ⅰ= 1; I＆下; = CNT;我++）{打印[Ⅰ]≥ FNAME}CNT = 0
}

我想知道是否有反正做平均a和b？谢谢你。

解决方案

您可以通过改变 handleBlock（）如下得到的输出文件的平均值>

 函数handleBlock（）{
  如果（M [1] -m [CNT]下; = 1）FNAME = F1
  否则FNAME = F2
    ＃计算$ 17个领域的总和为组
  对于（i = 1; I＆LT; = CNT;我++）{总和+ = M [I]}
    ＃计算平均
  平均= CNT＆GT; 0？总和/ CNT：总和
    ＃用于输出最大线，分割成一个输出数组：oarr
  FCNT =分（一[1]，oarr）
    ＃修改输出数组的第17场
  oarr [17] =平均
    ＃编写更新阵列所需的文件，一个字段在一个时间
  对于（i = 1; I＆LT; = FCNT;我++）{
    的printf（％s％S，oarr [I]，我== FCNT\\ n：FS？）GT; FNAME
  }
  CNT = 0;总和= 0
}

对原始脚本注释。

We have an input here:

cpdID,cpd_number,Cell_assay_id,Cell_alt_assay_id,Cell_type_desc,Cell_Operator,Cell_result_value,Cell_unit_value,assay_id,alt_assay_id,type_desc,operator,result_value,unit_value,Ratio_operator,Ratio,log_ratio,Cell_experiment_date,experiment_date,Cell_discipline,discipline
49,cpd-7788990,1212,2323, IC50 ,,100,uM,1334,1331,Ki,,10,uM,,10,-1,12/6/2006 0:00,2/16/2007 0:00,Cell,Enzyme
49,cpd-7788990,5555,6666, IC50 ,>,150,uM,1334,1331,Ki,,10,uM,>,15,-2,12/6/2006 0:00,2/16/2007 0:00,Cell,Enzyme
49,cpd-7788990,8888,9999, IC50 ,,200,uM,1334,1331,Ki,,10,uM,,20,-3,12/6/2006 0:00,2/16/2007 0:00,Cell,Enzyme
49,cpd-6666666,8888,9999, IC50 ,,400,uM,1334,1331,Ki,,10,uM,,40,-1,12/6/2006 0:00,2/16/2007 0:00,Cell,Enzyme
49,cpd-1111,8888,9999, IC50 ,,400,uM,1334,1331,Ki,,10,uM,,40,-1,12/6/2006 0:00,2/16/2007 0:00,Cell,Enzyme
49,cpd-1111,8888,9999, IC50 ,,400,uM,1334,1331,Ki,,10,uM,,40,-1.1,12/6/2006 0:00,2/16/2007 0:00,Cell,Enzyme
49,cpd-1111,8888,9999, IC50 ,,400,uM,1334,1331,Ki,,10,uM,,40,-1.2,12/6/2006 0:00,2/16/2007 0:00,Cell,Enzyme
49,cpd-1111,8888,9999, IC50 ,,400,uM,1334,1331,Ki,,10,uM,,40,-1.3,12/6/2006 0:00,2/16/2007 0:00,Cell,Enzyme

We would like to separate this input.csv into 2 files

If the $2 is the same and the max minus min in $17 <= 1 ", average $17 and put it into "file a".

If the $2 is the same and the max minus min in $17 > 1 ", average $17 and put it into "file b".

Note: If there is an unique $2 itself, we would like to keep it here (cpd-6666666 as an example)

Note: cpd-1111 ($17 max-min) = -1-(-1.3)=0.3 < 1

a: where ($17 max-min)<=1 . The new $17 in cpd-1111($2) is the average of (-1,-1.1,-1.2,-1.3) = -1.15

cpdID,cpd_number,Cell_assay_id,Cell_alt_assay_id,Cell_type_desc,Cell_Operator,Cell_result_value,Cell_unit_value,assay_id,alt_assay_id,type_desc,operator,result_value,unit_value,Ratio_operator,Ratio,log_ratio,Cell_experiment_date,experiment_date,Cell_discipline,discipline
49,cpd-6666666,8888,9999, IC50 ,,400,uM,1334,1331,Ki,,10,uM,,40,-1,12/6/2006 0:00,2/16/2007 0:00,Cell,Enzyme
49,cpd-1111,8888,9999, IC50 ,,400,uM,1334,1331,Ki,,10,uM,,40,-1.15,12/6/2006 0:00,2/16/2007 0:00,Cell,Enzyme

b:where ($17 max-min)>1 . The new $17 in cpd-7788990($2) is the average of (-1,-2,-3) = -2

cpdID,cpd_number,Cell_assay_id,Cell_alt_assay_id,Cell_type_desc,Cell_Operator,Cell_result_value,Cell_unit_value,assay_id,alt_assay_id,type_desc,operator,result_value,unit_value,Ratio_operator,Ratio,log_ratio,Cell_experiment_date,experiment_date,Cell_discipline,discipline
49,cpd-7788990,1212,2323, IC50 ,,100,uM,1334,1331,Ki,,10,uM,,10,-2,12/6/2006 0:00,2/16/2007 0:00,Cell,Enzyme

Here is the attempt which could separate input into a and b but haven't done average yet.

#!/usr/bin/awk -f

BEGIN {FS=","; f1="a"; f2="b"}

FNR==1 { print $0 > f1; print $0 > f2; next }

$2!=last_id && FNR > 2 { handleBlock() }

{ a[++cnt]=$0; m[cnt]=$17; last_id=$2 }

END { handleBlock() }

function handleBlock() {

if( m[1]-m[cnt]<=1 ) fname = f1

else fname = f2

for( i=1;i<=cnt;i++ ) { print a[i] > fname }  

cnt=0
}

May I know if there is anyway to do the average in a and b? Thanks.

解决方案

You can get the averages in the output files by altering handleBlock() as follows:

function handleBlock() {
  if( m[1]-m[cnt]<=1 ) fname = f1
  else fname = f2
    # compute the sum of the $17 fields for the group
  for( i=1;i<=cnt;i++ ) { sum+=m[i] }
    # compute the average
  avg = cnt > 0 ? sum/cnt : sum
    # use the max line for the output, split into an output array: oarr
  fcnt = split( a[1], oarr )
    # modify the 17th field of the output array
  oarr[17]=avg
    # write the updated array to the desired file one field at a time
  for( i=1;i<=fcnt;i++ ) {
    printf( "%s%s", oarr[i], i==fcnt ? "\n" : FS ) > fname
  }
  cnt=0; sum=0
}

Check here for comments on the original script.

这篇关于awk的基础上$ 2和$ 17个独立的行和做平均的$ 17日的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

awk的基础上$ 2和$ 17个独立的行和做平均的$ 17日 [英] awk separate rows based on $2 and $17 and do average on $17

问题描述

相关文章

Linux/Unix最新文章

热门教程

热门工具

登录关闭

awk的基础上$ 2和$ 17个独立的行和做平均的$ 17日 [英] awk separate rows based on $2 and $17 and do average on $17

问题描述

相关文章

Linux/Unix最新文章

热门教程

热门工具

登录 关闭

登录关闭