bash脚本从多个CSV文件找到匹配的行,并创建报告出来吧 [英] Bash script to find matching rows from multiple csv files and create report out of it

查看:284
本文介绍了bash脚本从多个CSV文件找到匹配的行,并创建报告出来吧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有三个不同的CSV文件。有格式如下:

I have three different CSV files. There format is as follows :

domain1.csv

name1,lastname1
name2,lastname2
name3,lastname3

domain2.csv

name1,lastname1
name6,lastname6
name3,lastname3

domain3.csv

name1,lastname1
name4,lastname4
name3,lastname3

现在在此基础上三个文件,我需要创造这样的报告

Now based on this three files i need to create report like this

name,lastname,domain1,domain2,domain3
name1,lastname1,yes,yes,yes
name2,lastname2,yes,no,no
name3,lastname3,yes,yes,yes
name4,lastname4,no,no,yes
name6,lastname6,no,yes,no

基本上,此报告是唯一可能使用脚本可以读取的行逐个每个文件中,并找到在其它两个文件行和通过匹配名字和姓氏列创建报告。但我在shell脚本总新手。有人可以帮助我。我使用bash。

Basically this report is only possible using a script which can read rows one by one in each file and find that row in other two files and create the report by matching the name and lastname columns. But i am a total newbie in shell scripting. Can someone help me. I am using bash.

推荐答案

AWK 是带有大多数的Unix类操作系统标准,并允许你的小型语言很容易解决这类问题。

awk is a mini-language that comes standard with most Unix-like operating systems and allows you to tackle this sort of problem fairly easily.

awk '{ names[$0] = (names[$0] "," FILENAME) }
     END { print "name,lastname,domain1,domain2,domain3"
           for( elt in names ) {
             printf "%s,%s,%s,%s\n", elt,
                                     index( names[elt], "domain1.csv" ) ? "yes" : "no",
                                     index( names[elt], "domain2.csv" ) ? "yes" : "no",
                                     index( names[elt], "domain3.csv" ) ? "yes" : "no"
           }
         }' domain*.csv | sort

上面的脚本解析每个文件行由行和使用构造关联数组 nameN,lastnameN 为指标,他们在所发现的文件名(用逗号分隔)作为值。然后,它遍历关联数组并打印每个索引其次是是或取决于数组的值是否包含每个文件名无的字符串。

The script above parses each of the files line-by-line and constructs an associative array using nameN,lastnameN as the indexes and the filenames that they're found in (separated by commas) as the values. Then it loops through the associative array and prints each index followed by the "yes" or "no" strings depending on whether the array's values contain each filename.

这篇关于bash脚本从多个CSV文件找到匹配的行,并创建报告出来吧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆