python pandas合并多个csv文件 [英] python pandas merge multiple csv files

查看：55 发布时间：2021/4/27 19:37:13 python pandas csv datetime

本文介绍了python pandas合并多个csv文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我大约有600个csv文件数据集，都具有相同的列名['DateTime'，'Actual'，'Consensus'，'Previous'，'Revised']，所有经济指标和所有时间序列数据集

I have around 600 csv file datasets, all have the very same column names [‘DateTime’, ‘Actual’, ‘Consensus’, ‘Previous’, ‘Revised’], all economic indicators and all-time series data sets.

目标是将它们全部合并到一个csv文件中.

the aim is to merge them all together in one csv file.

以"DateTime"作为索引.

With ‘DateTime’ as an index.

我希望此文件编入索引的方式是时间轴方式，这意味着让我们说第一个csv中的第一个事件的日期为12/18/2017 10:00:00，第二个csv中的第一个事件的日期为12/29/2017/09:00:00和日期为2017年12月20日09:00:00的第三个csv中的第一个事件.

The way I wanted this file to indexed in is the time line way which means let’s say the first event in the first csv dated in 12/18/2017 10:00:00 and first event in the second csv dated in 12/29/2017 09:00:00 and first event in the third csv dated in 12/20/2017 09:00:00.

因此，尽管源csv最初来自于我，但我还是希望先索引它们，之后再更新，等等.

So, I want to index them the later first and the newer after it, etc. despite the source csv it originally from.

我尝试将其中的3个作为实验进行合并，而问题出在'DateTime'，因为它像这样将3个一起打印('12/18/2017 10:00:00'，'12/29/2017 09:00:00'，'12/20/2017 09:00:00')这是代码:

I tried to merge just 3 of them as an experiment and the problem is the ‘DateTime’ because it prints the 3 of them together like this ('12/18/2017 10:00:00', '12/29/2017 09:00:00', '12/20/2017 09:00:00') Here is the code:

import pandas as pd


df1 = pd.read_csv("E:\Business\Economic Indicators\Consumer Price Index - Core (YoY) - European Monetary Union.csv")
df2 = pd.read_csv("E:\Business\Economic Indicators\Private loans (YoY) - European Monetary Union.csv")
df3 = pd.read_csv("E:\Business\Economic Indicators\Current Account s.a - European Monetary Union.csv")

df = pd.concat([df1, df2, df3], axis=1, join='inner')
df.set_index('DateTime', inplace=True)

print(df.head())
df.to_csv('df.csv')

推荐答案

考虑使用 read_csv() args， index_col 和 parse_dates ，以在导入期间创建索引并将其格式化为日期时间.然后运行所需的水平合并.下面假设日期在csv的第一列中.最后，使用 sort_index() 在最终数据帧上以对日期时间进行排序.


Consider using read_csv() args, index_col and parse_dates, to create indices during import and format as datetime. Then run your needed horizontal merge. Below assumes date is in first column of csv. And at the end use sort_index() on final dataframe to sort the datetimes.
df1 = pd.read_csv(r"E:\Business\Economic Indicators\Consumer Price Index - Core (YoY) - European Monetary Union.csv",
                  index_col=[0], parse_dates=[0])
df2 = pd.read_csv(r"E:\Business\Economic Indicators\Private loans (YoY) - European Monetary Union.csv",
                  index_col=[0], parse_dates=[0])
df3 = pd.read_csv(r"E:\Business\Economic Indicators\Current Account s.a - European Monetary Union.csv",
                  index_col=[0], parse_dates=[0])

finaldf = pd.concat([df1, df2, df3], axis=1, join='inner').sort_index()

对于DRY-er方法(尤其是在数百个csv文件中)，请使用列表理解
And for DRY-er approach especially across the hundreds of csv files, use a list comprehension
import os
...
os.chdir('E:\\Business\\Economic Indicators')

dfs = [pd.read_csv(f, index_col=[0], parse_dates=[0])
        for f in os.listdir(os.getcwd()) if f.endswith('csv')]

finaldf = pd.concat(dfs, axis=1, join='inner').sort_index()


                        这篇关于python pandas合并多个csv文件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

python pandas合并多个csv文件 [英] python pandas merge multiple csv files

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

python pandas合并多个csv文件 [英] python pandas merge multiple csv files

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭