使用Python(或Bash)合并CSV [英] Merge CSVs using Python (or Bash)
问题描述
我在一个文件夹中有一组CSV文件,我想将它们合并到一个"super-csv"中.有些列在所有文件中都可用,有些则不可用.
I have a set of CSV files in a folder and I'd like to merge them in one "super-csv". Some of the columns are available in all files, some not.
如果输出中的字段在源中不可用,则输出中的字段应为空.如果多个CSV上的列名相同,则应填充现有列(示例中为Name)
Fields in the output should just be empty, if it was not available in the source. If a columnname is the same over multiple CSV, it should fill the existing column (Name in the example)
File1.CSV
File1.CSV
ID Name ContactNo
53 Vikas 9874563210
File2.CSV
File2.CSV
ID Name Designation
23 MyShore Software Engineer
预期输出
ID Name ContactNo Designation
53 Vikas 9874563210
23 MyShore Software Engineer
我已经尝试了其他解决方案,但是它们不能处理空字段.例如. 合并具有不同列顺序的CSV文件会删除重复项
I've already tried other solutions, but they cannot handle empty fields. eg. merge csv files with different column order remove duplicates
预先感谢
迈克尔
推荐答案
在python中,您可以使用 pandas 模块,该模块允许从csv填充数据框,合并数据框,然后将合并的数据框保存到新的csv文件中.
In python, you can use the pandas module that allows to fill a dataframe from a csv, merge dataframe and then save the merged dataframe into new csv file.
例如:
import pandas as pd
df1 = pd.DataFrame.from_csv("file1.csv", sep=",")
df2 = pd.DataFrame.from_csv("file2.csv", sep=",")
final_df = df1.reset_index().merge(df2.reset_index(), how="outer").set_index('ID')
final_df.to_csv("result.csv", sep=",")
会产生
ID,Name,ContactNo,Designation
53,Vikas,9874563210.0,
23,MyShore,,Software Engineer
您必须使用sep
参数来适应您的文件格式.
You would have to play with the sep
argument to adapt to your files format.
这篇关于使用Python(或Bash)合并CSV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!