使用Python(或Bash)合并CSV [英] Merge CSVs using Python (or Bash)

查看:91
本文介绍了使用Python(或Bash)合并CSV的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在一个文件夹中有一组CSV文件,我想将它们合并到一个"super-csv"中.有些列在所有文件中都可用,有些则不可用.

I have a set of CSV files in a folder and I'd like to merge them in one "super-csv". Some of the columns are available in all files, some not.

如果输出中的字段在源中不可用,则输出中的字段应为空.如果多个CSV上的列名相同,则应填充现有列(示例中为Name)

Fields in the output should just be empty, if it was not available in the source. If a columnname is the same over multiple CSV, it should fill the existing column (Name in the example)

File1.CSV

File1.CSV

ID        Name       ContactNo
53        Vikas      9874563210

File2.CSV

File2.CSV

ID     Name          Designation
23    MyShore    Software Engineer

预期输出

ID        Name          ContactNo           Designation 
53        Vikas         9874563210
23        MyShore                          Software Engineer

我已经尝试了其他解决方案,但是它们不能处理空字段.例如. 合并具有不同列顺序的CSV文件会删除重复项

I've already tried other solutions, but they cannot handle empty fields. eg. merge csv files with different column order remove duplicates

预先感谢

迈克尔

推荐答案

在python中,您可以使用 pandas 模块,该模块允许从csv填充数据框,合并数据框,然后将合并的数据框保存到新的csv文件中.

In python, you can use the pandas module that allows to fill a dataframe from a csv, merge dataframe and then save the merged dataframe into new csv file.

例如:

import pandas as pd
df1 = pd.DataFrame.from_csv("file1.csv", sep=",")
df2 = pd.DataFrame.from_csv("file2.csv", sep=",")
final_df = df1.reset_index().merge(df2.reset_index(), how="outer").set_index('ID')

final_df.to_csv("result.csv", sep=",")

会产生

ID,Name,ContactNo,Designation
53,Vikas,9874563210.0, 
23,MyShore,,Software Engineer 

您必须使用sep参数来适应您的文件格式.

You would have to play with the sep argument to adapt to your files format.

这篇关于使用Python(或Bash)合并CSV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆