用匹配的键合并行 [英] Combine lines with matching keys

查看：57 发布时间：2020/9/15 7:39:10 python linux perl awk

本文介绍了用匹配的键合并行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个具有以下结构的文本文件

I have a text file with the following structure

ID,operator,a,b,c,d,true
WCBP12236,J1,75.7,80.6,65.9,83.2,82.1
WCBP12236,J2,76.3,79.6,61.7,81.9,82.1
WCBP12236,S1,77.2,81.5,69.4,84.1,82.1
WCBP12236,S2,68.0,68.0,53.2,68.5,82.1
WCBP12234,J1,63.7,67.7,72.2,71.6,75.3
WCBP12234,J2,68.6,68.4,41.4,68.9,75.3
WCBP12234,S1,81.8,82.7,67.0,87.5,75.3
WCBP12234,S2,66.6,67.9,53.0,70.7,75.3
WCBP12238,J1,78.6,79.0,56.2,82.1,84.1
WCBP12239,J2,66.6,72.9,79.5,76.6,82.1
WCBP12239,S1,86.6,87.8,23.0,23.0,82.1
WCBP12239,S2,86.0,86.9,62.3,89.7,82.1
WCBP12239,J1,70.9,71.3,66.0,73.7,82.1
WCBP12238,J2,75.1,75.2,54.3,76.4,84.1
WCBP12238,S1,65.9,66.0,40.2,66.5,84.1
WCBP12238,S2,72.7,73.2,52.6,73.9,84.1

每个ID对应于操作员多次分析的数据集.即J1和J2是运算符J的第一次和第二次尝试.度量a，b，c和d使用4种略有不同的算法来度量一个值，其真实值位于列true

Each ID corresponds to a dataset which is analysed by an operator several times. i.e J1 and J2 are the first and second attempt by operator J. The measures a, b, c and d use 4 slightly different algorithms to measure a value whose true value lies in the column true

我想做的是创建3个新的文本文件，比较J1 vs J2，S1 vs S2和J1 vs S1的结果. J1与J2的示例输出:

What I would like to do is to create 3 new text files comparing the results for J1 vs J2, S1 vs S2 and J1 vs S1. Example output for J1 vs J2:

ID,operator,a1,a2,b1,b2,c1,c2,d1,d2,true
WCBP12236,75.7,76.3,80.6,79.6,65.9,61.7,83.2,81.9,82.1
WCBP12234,63.7,68.6,67.7,68.4,72.2,41.4,71.6,68.9,75.3

其中a1是J1等的测量a.

另一个示例是S1与S2的对比:

Another example is for S1 vs S2:

ID,operator,a1,a2,b1,b2,c1,c2,d1,d2,true
WCBP12236,77.2,68.0,81.5,68.0,69.4,53.2,84.1,68.5,82.1
WCBP12234,81.8,66.6,82.7,67.9,67.0,53,87.5,70.7,75.3

ID不会按字母数字顺序排列，也不会为相同的ID聚集运算符.我不确定如何最好地完成此任务-使用linux工具或诸如perl/python之类的脚本语言.

The IDs will not be in alphanumerical order nor will the operators be clustered for the same ID. I'm not certain how best to approach this task - using linux tools or a scripting language like perl/python.

我最初使用linux的尝试很快遇到了障碍

My initial attempt using linux quickly hit a brick wall

首先找到所有唯一ID(已排序)

First find all unique IDs (sorted)

awk -F, '/^WCBP/ {print $1}' file | uniq | sort -k 1.5n > unique_ids

浏览这些ID，然后对J1，J2进行排序:

Loop through these IDs and sort J1, J2:

foreach i (`more unique_ids`)
    grep $i test.txt | egrep 'J[1-2]' | sort -t',' -k2
end

这给了我排序后的数据

WCBP12234,J1,63.7,67.7,72.2,71.6,75.3
WCBP12234,J2,68.6,68.4,41.4,68.9,80.4
WCBP12236,J1,75.7,80.6,65.9,83.2,82.1
WCBP12236,J2,76.3,79.6,61.7,81.9,82.1
WCBP12238,J1,78.6,79.0,56.2,82.1,82.1
WCBP12238,J2,75.1,75.2,54.3,76.4,82.1
WCBP12239,J1,70.9,71.3,66.0,73.7,75.3
WCBP12239,J2,66.6,72.9,79.5,76.6,75.3

我不确定如何重新排列此数据以获得所需的结构.我尝试在foreach循环awk 'BEGIN {RS="\n\n"} {print $1, $3,$10,$4,$11,$5,$12,$6,$13,$7}'

I'm not sure how to rearrange this data to get the desired structure. I tried adding an additional pipe to awk in the foreach loop awk 'BEGIN {RS="\n\n"} {print $1, $3,$10,$4,$11,$5,$12,$6,$13,$7}'

有什么想法吗?我敢肯定，使用awk可以用不太麻烦的方式完成此操作，尽管使用适当的脚本语言可能会更好.

Any ideas? I'm sure this can be done in a less cumbersome manner using awk, although it may be better using a proper scripting language.

用匹配的键合并行 [英] Combine lines with matching keys

问题描述

推荐答案

相关文章

服务器开发最新文章

热门教程

热门工具

登录关闭

用匹配的键合并行 [英] Combine lines with matching keys

问题描述

推荐答案

相关文章

服务器开发最新文章

热门教程

热门工具

登录 关闭

登录关闭