命令行支点 [英] command line pivot

查看：142 发布时间：2016/7/28 16:49:27 perl bash awk pivot-table gawk

本文介绍了命令行支点的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我一直在狩猎绕不过去的几天一组命令行工具，一个perl或awk脚本，让我很快转以下数据：

I've been hunting around the past few days for a set of command line tools, a perl or an awk script that allow me to very quickly transpose the following data:

Row|Col|Val
1|A|foo
1|B|bar
1|C|I have a real
2|A|bad
2|C|hangover

这个：

A|B|C
foo|bar|I have a real
bad||hangover

需要注意的是在数据集中为每一个细胞只有一个值（即，与A S preadsheet，没有排任何重复1上校A）

Note that there is only one value in the dataset for each "cell" (i.e., as with a spreadsheet, there aren't any duplicates of Row "1" Col "A")

我已经试过各种外壳awk的实现方案移调数据 - 但似乎无法让他们的工作。我有一个想法是削减每个上校的价值在不同的文件，然后使用连接命令行中行把他们重新走到一起 - 但必须有一个更简单的方法。我敢肯定，这只是非常简单的事情 - 但我挣扎了一下

I've tried various awk shell implementations for transposing data - but can't seem to get them working. One idea I had was to cut each "Col" value into a separate file, then use the "join" command line to put them back together by "Row" -- but there MUST be an easier way. I'm sure this is just incredibly simple to do - but I'm struggling a bit.

我的输入文件到G COLS A（主要包括可变长度的字符串）和10,000行。如果我能避免加载一切都变成内存将是一个巨大的加分。

My input files have Cols A through G (mostly including variable length strings), and 10,000 Rows. If I can avoid loading everything into memory that would be a huge plus.

啤酒通过邮件的人谁的得到了答案！

Beer-by-mail for anyone who's got the answer!

像往常一样 - 提前许多感谢您的帮助。

As always - many thanks in advance for your help.

干杯，

乔希

P.S。 - 我有点惊讶的是没有UTIL一个不折不扣的现成的命令行做枢轴/换位这种操作非常基本的类型。我看着的http：//$c$c.google.com/p/openpivot/ ，在 HTTP：//$c$c.google.com/p /粉碎工具似乎/ 这两个要求总Calcs（计算）。

p.s. - I'm a bit surprised that there isn't an out-of-the-box command line util for doing this very basic type of pivot/transposition operation. I looked at http://code.google.com/p/openpivot/ and at http://code.google.com/p/crush-tools/ both of which seem to require aggregate calcs.

推荐答案

我能做到这一点的GAWK，但不NAWK。

I can do this in gawk, but not nawk.

#!/usr/local/bin/gawk -f

BEGIN {
  FS="|";
}

{
  rows[$1]=1; cols[$2]=1; values[$1][$2]=$3;
}

END {
  for (col in cols) {
    output=output sprintf("|%s", col);
  }
  print substr(output, 2);
  for (row in rows) {
    output="";
    for (col in cols) {
      output=output sprintf("|%s", values[row][col]);
    }
    print substr(output, 2);
  }
}

和它甚至还可以：

ghoti@pc $ cat data
1|A|foo
1|B|bar
1|C|I have a real
2|A|bad
2|C|hangover
ghoti@pc $ ./doit.gawk data
A|B|C
foo|bar|I have a real
bad||hangover
ghoti@pc $

我不知道有多好，这将有10000行的工作，但我怀疑，如果你已经得到了它的记忆，你会没事的。我看不出你如何通过存储在独立的文件，你会在以后加入的东西避免装载的东西到内存中的除了的。这是pretty太大的手动实现虚拟内存。

I'm not sure how well this will work with 10000 rows, but I suspect if you've got the memory for it, you'll be fine. I can't see how you can avoid loading things into memory except by storing things in separate files which you'd later join. Which is pretty much a manual implementation of virtual memory.

更新：

每评论：

#!/usr/local/bin/gawk -f

BEGIN {
  FS="|";
}

{
  rows[$1]=1; cols[$2]=1; values[$1,$2]=$3;
}

END {
  for (col in cols) {
    output=output sprintf("|%s", col);
  }
  print output;
  for (row in rows) {
    output="";
    for (col in cols) {
      output=output "|" values[row,col];
    }
    print row output;
  }
}

和输出：

ghoti@pc $ ./doit.awk data
|A|B|C
1|foo|bar|I have a real
2|bad||hangover
ghoti@pc $

这篇关于命令行支点的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

命令行支点 [英] command line pivot

问题描述

推荐答案

相关文章

Linux/Unix最新文章

热门教程

热门工具

登录关闭

命令行支点 [英] command line pivot

问题描述

推荐答案

相关文章

Linux/Unix最新文章

热门教程

热门工具

登录 关闭

登录关闭