AWK或Gawk的做数据匹配和合并 [英] Awk or Gawk to do data matching and merging

查看：479 发布时间：2016/7/29 11:13:02 awk gawk

本文介绍了AWK或Gawk的做数据匹配和合并的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

输入文件input.txt中是一个制表符分隔的UNI code TXT以

The input file input.txt is a tab delimited unicode txt with

a  A   e  f  m
b  B   g  h
c  C   i  j
b  B   k  l

欲由第一和第二列，以匹配和合并。所以我想用output.txt的

I want to match by the first and second column and merge. So I want to get output.txt with

a  A   e  f  m
b  B   g  h     k  l
c  C   i  j

的code的检测在输入的最大列数。因为它是5在本实施例中，K L从第6列放

The code has to detect the maximum number of columns in the input. Since it is 5 in this example, "k l" were put from 6th column.

其实我几乎管理时，他们都为数字要做到这一点用Matlab。但是，唉，当他们的信件，MATLAB在处理单code那么糟糕，虽然我读到有关如何在Matlab处理UNI code计算器我放弃了。所以，我现在转向蟒蛇。

Actually I almost managed to do this using Matlab when they are all numbers. But oh, when they were letters, Matlab was so bad at handling unicode, although I read stackoverflow about how to deal with unicode in Matlab I gave up. So I now turned to python.

在Nirk http://stackoverflow.com/posts/18164848 回应说，下面一行就行了。

Nirk at http://stackoverflow.com/posts/18164848 responded that the following line will do.

awk的-F \\ t'{a = $ 1\\ t的$ 2; $ 1 = $ 2 =; X [A] = X [A] $ 0} END {为（Y的X）打印Y，X [Y]}

awk -F\t '{a=$1 "\t" $2; $1=$2=""; x[a] = x[a] $0} END {for(y in x) print y,x[y]}'

然而，这code似乎没有指定输入和输出文件。

However this code doesn't seem to specify input and output file.

推荐答案

AWK是基于管道linux命令。为了养活输入文件并获得输出，你可以这样做：
awk的-F \\ t'{a = $ 1\\ t的$ 2; $ 1 = $ 2 =; X [A] = X [A] $ 0} END {为（Y的X）打印Y，X [Y]}'＆LT; INPUT.TXT> OUTPUT.TXT

awk is pipe-based linux command. To feed input file and get output, you can do like this: awk -F\t '{a=$1 "\t" $2; $1=$2=""; x[a] = x[a] $0} END {for(y in x) print y,x[y]}' < INPUT.TXT > OUTPUT.TXT

然而，awk程序上面难以匹配你需要什么的code的检测在输入的最大列数。由于它是5在本实施例中，KL，从第6列放。

However, the awk program above can hardly match what you need "The code has to detect the maximum number of columns in the input. Since it is 5 in this example, "k l" were put from 6th column.".

您可以试试这个Python程序：

You can try this python program:

max_value_fields = 0
values = dict()

with file("input.txt") as f:
    keys = []
    for line in f:
        line    = line.strip()
        fs      = line.split('\t')

        key = '%s\t%s' % (fs[0], fs[1])
        if key not in values:
            values[key] = list()
            keys.append(key)
        values[key].append(fs[2:])

        value_fields = len(fs) - 2
        if value_fields > max_value_fields:
            max_value_fields = value_fields

with file("output.txt", 'w+') as f:
    for key in keys:
        fields = [key]
        for value_list in values[key]:
            fields.extend([value for value in value_list])
            fields.extend(['' for i in xrange(max_value_fields - len(value_list))])
        print >> f, '\t'.join(fields)

这篇关于AWK或Gawk的做数据匹配和合并的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

AWK或Gawk的做数据匹配和合并 [英] Awk or Gawk to do data matching and merging

问题描述

推荐答案

相关文章

Linux/Unix最新文章

热门教程

热门工具

登录关闭

AWK或Gawk的做数据匹配和合并 [英] Awk or Gawk to do data matching and merging

问题描述

推荐答案

相关文章

Linux/Unix最新文章

热门教程

热门工具

登录 关闭

登录关闭