使用awk合并2个csv文件 [英] merging 2 csv files using awk

查看:205
本文介绍了使用awk合并2个csv文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有3个CSV文件:

基本文件(值初始化为0)

Base File(values initialised with 0)

steve tignor ash michael jose sam joshua
0       0     0     0     0     0    0

文件1:

tignor michael  jose
888      9       -2

文件2:

ash joshua
77   66

我需要的输出:

      steve tignor ash michael jose sam joshua
File1   0    888    0    9      -2   0     0
File2   0     0     77   0       0   0     66

我尝试先用awk对文件进行排序,然后与粘贴合并,但是由于我有1000多个列并且有30个文件,所以它不起作用.

I tried with sorting the files first with awk and then merge with paste but as I have 1000+ columns and having 30 files it just did not work.

代码:

awk -F"," 'NR==1{
  split($0,a,FS);asort(a);
  for(i=1;i<=NF;i++)b[$i]=i
} {
  for(i=1;i<=NF;i++)printf("%s,",$(b[a[i]]));
  print x
}' File1 > 1.csv

awk -F"," 'NR==1{
  split($0,a,FS);asort(a);
  for(i=1;i<=NF;i++)b[$i]=i
} {
  for(i=1;i<=NF;i++)printf("%s,",$(b[a[i]]));
  print x
}' File2 > 2.csv

paste -d"\n" 1.csv 2.csv > merge.csv

这里需要一些帮助.预先感谢.

Need some assistance here. Thanks in advance.

推荐答案

我假设您省略了文件中的逗号.如果您使用空格分隔的文件,则只需更改分割功能中使用的分隔符即可.

I assumed that you omitted the commas in the files. If you're using space separated files you could just change the separator used in the split function.

awk '
ARGIND==1 && FNR==1{
  split($0, base, ",")
  printf("file,%s\n",$0)
}
ARGIND > 1 && FNR==1{
  split($0, names, ",")
  printf("%s", ARGV[ARGIND])
}
ARGIND > 1 && FNR==2{
  split($0, values, ",")
  for(i in names)
    line[names[i]] = values[i]
  for(i in base){
    if(base[i] in line)
      printf(",%s", line[base[i]])
    else
      printf(",0")
  }
  delete line
  print ""
}
' base.csv file1.csv file2.csv

示例:

file1.csv:

file1.csv:

tignor,michael,jose
888,9,-2

file2.csv:

file2.csv:

ash,joshua
77,66

和base.csv:

steve,tignor,ash,michael,jose,sam,joshua
0,0,0,0,0,0,0

输出为:

file,steve,tignor,ash,michael,jose,sam,joshua
file1.csv,0,888,0,9,-2,0,0
file2.csv,0,0,77,0,0,0,66

基本上,脚本以两个步骤运行:

Basically, the script is running in 2 steps:

  • 首先,我们从base.csv中读取名称并将其存储到 大批.
  • 然后,对于每个文件,我们存储出现在其标题中的名称, 尝试为基本csv中的每一列打印一个值.如果我们不 具有与我们刚刚在特定文件中的列对应的值 改为打印0.
  • First we read the names from the base.csv and store them into an array.
  • Then, for each file we store the names appearing in its header and try to print one value for each column in the base csv. if we don't have the value corresponding to a column in a particular file we just print 0 instead.

P.S.我制作了脚本的新POSIX awk兼容版本:

awk --posix '
NR==FNR && FNR==1{
  split($0, base, ",")
  printf("file,%s\n",$0)
}
NR>FNR && FNR==1{
  split($0, names, ",")
  printf("%s", FILENAME)
}
NR>FNR && FNR==2{
  split($0, values, ",")
  for(i in names)
    line[names[i]] = values[i]
  for(i in base){
    if(base[i] in line)
      printf(",%s", line[base[i]])
    else
      printf(",0")
  }
  delete line
  print ""
}
' base.csv file1.csv file2.csv

这篇关于使用awk合并2个csv文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆