python pandas合并多个csv文件 [英] python pandas merge multiple csv files

查看:55
本文介绍了python pandas合并多个csv文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我大约有600个csv文件数据集,都具有相同的列名['DateTime','Actual','Consensus','Previous','Revised'],所有经济指标和所有时间序列数据集

I have around 600 csv file datasets, all have the very same column names [‘DateTime’, ‘Actual’, ‘Consensus’, ‘Previous’, ‘Revised’], all economic indicators and all-time series data sets.

目标是将它们全部合并到一个csv文件中.

the aim is to merge them all together in one csv file.

以"DateTime"作为索引.

With ‘DateTime’ as an index.

我希望此文件编入索引的方式是时间轴方式,这意味着让我们说第一个csv中的第一个事件的日期为12/18/2017 10:00:00,第二个csv中的第一个事件的日期为12/29/2017/09:00:00和日期为2017年12月20日09:00:00的第三个csv中的第一个事件.

The way I wanted this file to indexed in is the time line way which means let’s say the first event in the first csv dated in 12/18/2017 10:00:00 and first event in the second csv dated in 12/29/2017 09:00:00 and first event in the third csv dated in 12/20/2017 09:00:00.

因此,尽管源csv最初来自于我,但我还是希望先索引它们,之后再更新,等等.

So, I want to index them the later first and the newer after it, etc. despite the source csv it originally from.

我尝试将其中的3个作为实验进行合并,而问题出在'DateTime',因为它像这样将3个一起打印('12/18/2017 10:00:00','12/29/2017 09:00:00','12/20/2017 09:00:00')这是代码:

I tried to merge just 3 of them as an experiment and the problem is the ‘DateTime’ because it prints the 3 of them together like this ('12/18/2017 10:00:00', '12/29/2017 09:00:00', '12/20/2017 09:00:00') Here is the code:

import pandas as pd


df1 = pd.read_csv("E:\Business\Economic Indicators\Consumer Price Index - Core (YoY) - European Monetary Union.csv")
df2 = pd.read_csv("E:\Business\Economic Indicators\Private loans (YoY) - European Monetary Union.csv")
df3 = pd.read_csv("E:\Business\Economic Indicators\Current Account s.a - European Monetary Union.csv")

df = pd.concat([df1, df2, df3], axis=1, join='inner')
df.set_index('DateTime', inplace=True)

print(df.head())
df.to_csv('df.csv')

推荐答案

考虑使用 read_csv() args, index_col parse_dates ,以在导入期间创建索引并将其格式化为日期时间.然后运行所需的水平合并.下面假设日期在csv的第一列中.最后,使用 sort_index() 在最终数据帧上以对日期时间进行排序.

Consider using read_csv() args, index_col and parse_dates, to create indices during import and format as datetime. Then run your needed horizontal merge. Below assumes date is in first column of csv. And at the end use sort_index() on final dataframe to sort the datetimes.

df1 = pd.read_csv(r"E:\Business\Economic Indicators\Consumer Price Index - Core (YoY) - European Monetary Union.csv",
                  index_col=[0], parse_dates=[0])
df2 = pd.read_csv(r"E:\Business\Economic Indicators\Private loans (YoY) - European Monetary Union.csv",
                  index_col=[0], parse_dates=[0])
df3 = pd.read_csv(r"E:\Business\Economic Indicators\Current Account s.a - European Monetary Union.csv",
                  index_col=[0], parse_dates=[0])

finaldf = pd.concat([df1, df2, df3], axis=1, join='inner').sort_index()

对于DRY-er方法(尤其是在数百个csv文件中),请使用列表理解

And for DRY-er approach especially across the hundreds of csv files, use a list comprehension

import os
...
os.chdir('E:\\Business\\Economic Indicators')

dfs = [pd.read_csv(f, index_col=[0], parse_dates=[0])
        for f in os.listdir(os.getcwd()) if f.endswith('csv')]

finaldf = pd.concat(dfs, axis=1, join='inner').sort_index()

这篇关于python pandas合并多个csv文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆