如何使用AWK解决此2表数据联接? [英] How to use AWK to solve this 2 Table Data Join?

查看:54
本文介绍了如何使用AWK解决此2表数据联接?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有2个数据表,如图所示(它们是2个制表符分隔的文件). 我正在尝试使用表1中的相应国家/地区填充表2国家/地区"列.需要从表2的名字"字段中的信息中加入".

I have 2 data tables as shown (they are 2 x tab-delimited files). I'm trying to populate the Table-2 Country column with the corresponding country from Table-1. Need to "join" from info in Table-2's Firstname field.

考虑到Table-2,Firstname列中数据的复杂性,这里最好的方法是什么?其他Mac工具是否比AWK更好地使用,例如Excel公式,Perl,Filemaker等?

What's the best approach here, given the complexity of the data in the Table-2, Firstname column? Would other Mac tools be better to use than AWK e.g. Excel formulae, Perl, Filemaker etc?

TABLE1(输入):

TABLE1 (Input):

city_ascii  country iso2
Mavinga Angola  AO
Menongue    Angola  AO
Mucusso Angola  AO
Guines  Cuba    CU
Havana  Cuba    CU
Holguin Cuba    CU
Las Tunas   Cuba    CU
Manzanillo  Cuba    CU
Matanzas    Cuba    CU
Moron   Cuba    CU
Santa Clara Cuba    CU
Varadero    Cuba    CU

TABLE2(输入):

TABLE2 (Input):

Firstname
Fred, Havana
James, (Varadero, Cuba)
Jack (Cuba)
Harry Varadero, Cuba
Josh Cuba
Gary, Mavinga & Other, Angola
Jamie, (Angola)

TABLE2(结果):

TABLE2 (Result):

Firstname   Country
Fred, Havana  Cuba
James, (Varadero, Cuba) Cuba
Jack (Cuba) Cuba
Harry Varadero, Cuba    Cuba
Josh Cuba   Cuba
Gary, Mavinga & Other, Angola   Angola
Jamie, (Angola) Angola

============= 以下是针对Ed的以下问题的调试信息:

============ Here is debugging info in answer to Ed's Qs below:

awk -F'\t' '{print NF"<"$1"><"$2"><"$3">"}' Table3.txt | cat -v

    1<city_ascii  country iso2><><>
    1<Mavinga Angola  AO><><>
    1<Menongue    Angola  AO><><>
    1<Mucusso Angola  AO><><>
    1<Guines  Cuba    CU><><>
    1<Havana  Cuba    CU><><>
    1<Holguin Cuba    CU><><>
    1<Las Tunas   Cuba    CU><><>
    1<Manzanillo  Cuba    CU><><>
    1<Matanzas    Cuba    CU><><>
    1<Moron   Cuba    CU><><>
    1<Santa Clara Cuba    CU><><>
    1<Varadero    Cuba    CU><><>

    ==============
    awk -F'\t' '{print NF"<"$1"><"$2"><"$3">"}' Table4.txt | cat -v

    1<Firstname><><>
    1<Fred, Havana><><>
    1<James, (Varadero, Cuba)><><>
    1<Jack (Cuba)><><>
    1<Harry Varadero, Cuba><><>
    1<Josh Cuba><><>
    1<Gary, Mavinga & Other, Angola><><>
    1<Jamie, (Angola)><><>

    ===============
    cat -v tst.awk

    BEGIN { FS=OFS="\t" }
    NR==FNR {
        map[$1] = $2
        map[$2] = $2
        next
    }
    FNR==1 {
        print
        FS=" "
        next
    }
    {
        orig = $0
        country = ""
        gsub(/[^[:alpha:]]/," ")
        for (i=NF; i>0; i--) {
            if ($i in map) {
                country = map[$i]
                break
            }
        }
        print orig, country
    }

    ===============
    awk -f tst.awk Table3.txt Table4.txt >output.txt

    Firstname
    Fred, Havana    
    James, (Varadero, Cuba) 
    Jack (Cuba) 
    Harry Varadero, Cuba    
    Josh Cuba   
    Gary, Mavinga & Other, Angola   
    Jamie, (Angola) 

    ================
    awk -F'\t' '{print NF"<"$1"><"$2"><"$3">"}' output.txt | cat -v

    1<Firstname><><>
    2<Fred, Havana><><>
    2<James, (Varadero, Cuba)><><>
    2<Jack (Cuba)><><>
    2<Harry Varadero, Cuba><><>
    2<Josh Cuba><><>
    2<Gary, Mavinga & Other, Angola><><>
    2<Jamie, (Angola)><><>

推荐答案

听起来这可能正是您想要的:

It sounds like this might be what you're looking for:

$ cat tst.awk
BEGIN { FS=OFS="\t" }
NR==FNR {
    map[$1] = $2
    map[$2] = $2
    next
}
FNR==1 {
    print
    FS=" "
    next
}
{
    orig = $0
    country = ""
    gsub(/[^[:alpha:]]/," ")
    for (i=NF; i>0; i--) {
        if ($i in map) {
            country = map[$i]
            break
        }
    }
    print orig, country
}

$ awk -f tst.awk file1 file2
Firstname       Country
Fred, Havana    Cuba
James, (Varadero, Cuba) Cuba
Jack (Cuba)     Cuba
Harry Varadero, Cuba    Cuba
Josh Cuba       Cuba
Gary, Mavinga & Other, Angola   Angola
Jamie, (Angola) Angola

这篇关于如何使用AWK解决此2表数据联接?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆