在 Bash 中转置文件的有效方法 [英] An efficient way to transpose a file in Bash

查看:32
本文介绍了在 Bash 中转置文件的有效方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个巨大的制表符分隔文件,格式如下

I have a huge tab-separated file formatted like this

X column1 column2 column3
row1 0 1 2
row2 3 4 5
row3 6 7 8
row4 9 10 11

我想以一种有效的方式转置它只使用 bash 命令(我可以编写十行左右的 Perl 脚本来做到这一点,但它的执行速度应该比本地 bash 慢职能).所以输出应该看起来像

I would like to transpose it in an efficient way using only bash commands (I could write a ten or so lines Perl script to do that, but it should be slower to execute than the native bash functions). So the output should look like

X row1 row2 row3 row4
column1 0 3 6 9
column2 1 4 7 10
column3 2 5 8 11

我想到了这样的解决方案

I thought of a solution like this

cols=`head -n 1 input | wc -w`
for (( i=1; i <= $cols; i++))
do cut -f $i input | tr $'
' $'	' | sed -e "s/	$/
/g" >> output
done

但它很慢,而且似乎不是最有效的解决方案.我在 这篇文章 中看到了 vi 的解决方案,但它是还是太慢了.任何想法/建议/绝妙的想法?:-)

But it's slow and doesn't seem the most efficient solution. I've seen a solution for vi in this post, but it's still over-slow. Any thoughts/suggestions/brilliant ideas? :-)

推荐答案

awk '
{ 
    for (i=1; i<=NF; i++)  {
        a[NR,i] = $i
    }
}
NF>p { p = NF }
END {    
    for(j=1; j<=p; j++) {
        str=a[1,j]
        for(i=2; i<=NR; i++){
            str=str" "a[i,j];
        }
        print str
    }
}' file

输出

$ more file
0 1 2
3 4 5
6 7 8
9 10 11

$ ./shell.sh
0 3 6 9
1 4 7 10
2 5 8 11

Jonathan 在 10000 行文件上的 Perl 解决方案性能

Performance against Perl solution by Jonathan on a 10000 lines file

$ head -5 file
1 0 1 2
2 3 4 5
3 6 7 8
4 9 10 11
1 0 1 2

$  wc -l < file
10000

$ time perl test.pl file >/dev/null

real    0m0.480s
user    0m0.442s
sys     0m0.026s

$ time awk -f test.awk file >/dev/null

real    0m0.382s
user    0m0.367s
sys     0m0.011s

$ time perl test.pl file >/dev/null

real    0m0.481s
user    0m0.431s
sys     0m0.022s

$ time awk -f test.awk file >/dev/null

real    0m0.390s
user    0m0.370s
sys     0m0.010s

由 Ed Morton 编辑(@ghostdog74 如果您不同意,请随时删除).

EDIT by Ed Morton (@ghostdog74 feel free to delete if you disapprove).

也许这个带有一些更明确的变量名的版本将有助于回答下面的一些问题,并大致阐明脚本在做什么.它还使用制表符作为 OP 最初要求的分隔符,因此它可以处理空字段,并且巧合的是,对于这种特殊情况,它会稍微修饰输出.

Maybe this version with some more explicit variable names will help answer some of the questions below and generally clarify what the script is doing. It also uses tabs as the separator which the OP had originally asked for so it'd handle empty fields and it coincidentally pretties-up the output a bit for this particular case.

$ cat tst.awk
BEGIN { FS=OFS="	" }
{
    for (rowNr=1;rowNr<=NF;rowNr++) {
        cell[rowNr,NR] = $rowNr
    }
    maxRows = (NF > maxRows ? NF : maxRows)
    maxCols = NR
}
END {
    for (rowNr=1;rowNr<=maxRows;rowNr++) {
        for (colNr=1;colNr<=maxCols;colNr++) {
            printf "%s%s", cell[rowNr,colNr], (colNr < maxCols ? OFS : ORS)
        }
    }
}

$ awk -f tst.awk file
X       row1    row2    row3    row4
column1 0       3       6       9
column2 1       4       7       10
column3 2       5       8       11

上述解决方案适用于任何 awk(当然,旧的、损坏的 awk 除外 - 有 YMMV).

The above solutions will work in any awk (except old, broken awk of course - there YMMV).

上述解决方案确实将整个文件读入内存 - 如果输入文件太大,那么您可以这样做:

The above solutions do read the whole file into memory though - if the input files are too large for that then you can do this:

$ cat tst.awk
BEGIN { FS=OFS="	" }
{ printf "%s%s", (FNR>1 ? OFS : ""), $ARGIND }
ENDFILE {
    print ""
    if (ARGIND < NF) {
        ARGV[ARGC] = FILENAME
        ARGC++
    }
}
$ awk -f tst.awk file
X       row1    row2    row3    row4
column1 0       3       6       9
column2 1       4       7       10
column3 2       5       8       11

几乎不使用内存,但每行读取一次输入文件,因此它比将整个文件读入内存的版本慢得多.它还假设每行的字段数相同,并且它使用 GNU awk 来处理 ENDFILEARGIND 但任何 awk 都可以对 FNR 进行测试==1END.

which uses almost no memory but reads the input file once per number of fields on a line so it will be much slower than the version that reads the whole file into memory. It also assumes the number of fields is the same on each line and it uses GNU awk for ENDFILE and ARGIND but any awk can do the same with tests on FNR==1 and END.

这篇关于在 Bash 中转置文件的有效方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆