根据列将数据分组到类别中 [英] Grouping the data into categories based on a column

查看:66
本文介绍了根据列将数据分组到类别中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个制表符分隔的文件,该文件有2列,分别是:

I have a tab delimited file which have 2 columns as:

new.txt
    1.01   yes
    2.00   no
    0.93   no
    1.2223 yes
    1.7211 no

我想修改它的内容,就好像有两个类别一样:

I want to modify the contents of it as if there are two categories as:

new_categorized.txt
yes    no
1.01   2.00
1.2223 0.93
       1.7211

我在R(此处),但是我需要使用bash或awk来完成. 非常感谢您的帮助.

I have found a similar question with an answer in R (here) ,however I need to do it with bash or awk.. I would appreciate your help.

推荐答案

$ cat tst.awk
BEGIN { FS=OFS="\t" }
!($2 in label2colNr) {
    label2colNr[$2] = ++numCols
    colNr2label[numCols] = $2
}
{
    colNr = label2colNr[$2]
    val[++numRows[colNr],colNr] = $1
    maxRows = (numRows[colNr] > maxRows ? numRows[colNr] : maxRows)
}
END {
    for (colNr=1; colNr <= numCols; colNr++) {
        printf "%s%s", colNr2label[colNr], (colNr<numCols ? OFS : ORS)
    }

    for (rowNr=1; rowNr <= maxRows; rowNr++) {
        for (colNr=1; colNr <= numCols; colNr++) {
            printf "%s%s", val[rowNr,colNr], (colNr<numCols ? OFS : ORS)
        }
    }
}

$ awk -f tst.awk file
yes     no
1.01    2.00
1.2223  0.93
        1.7211

无论您在第二个字段中有多少个类别,无论它们的值是多少,以上内容都可以在任何UNIX系统上的任何Shell中的任何awk中使用.

The above will work with any awk in any shell on any UNIX system no matter how many categories you have in the 2nd field and no matter what their values are.

这篇关于根据列将数据分组到类别中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆