合并不同列的顺序删除重复的CSV文件 [英] merge csv files with different column order remove duplicates

查看:539
本文介绍了合并不同列的顺序删除重复的CSV文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有相同数量的每个但列不同的列订单多个CSV文件,我想将它们合并去除重复的,其他所有的解决方案在这里不考虑,因此列的顺序合并输出不正确,因此,如何做到这一点的是Windows的命令行(如LOGPARSER)或bash?

I have multiple CSV files with same number of columns BUT different column orders in each , I wanted to merge them removing duplicates, all of the other solutions here dont consider column order hence merging output is incorrect, Hence how to do it in either windows commandline(e.g logparser) or bash?

另外python脚本来实现,这也将这样做。

Also python script to achieve this would also do.

推荐答案

下面的脚本工作正常,如果:

The following script works properly if:


  • CSV不是太大(即可以在内存中加载)

  • 的CSV的第一行包含列名

您只需要填写文件 final_headers

import csv

files = ['c1.csv', 'c2.csv', 'c3.csv']
final_headers = ['col1', 'col2', 'col3']

merged_rows = set()
for f in files:
    with open(f, 'rb') as csv_in:
        csvreader = csv.reader(csv_in, delimiter=',')
    headers = dict((h, i) for i, h in enumerate(csvreader.next()))
        for row in csvreader:
            merged_rows.add(tuple(row[headers[x]] for x in final_headers))
with open('output.csv', 'wb') as csv_out:
    csvwriter = csv.writer(csv_out, delimiter=',')
    csvwriter.writerows(merged_rows)

这篇关于合并不同列的顺序删除重复的CSV文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆