如何将多个csv文件串联到一个以行名作为文件名的pandas数据框中? [英] How do I concatenate multiple csv files into a pandas dataframe, with the filenames as the row names?
问题描述
对于第1部分,我有多个csv文件,可以循环使用这些文件来创建仅具有摘要统计信息(中位数)的新csv文件。新的csv文件开头带有原始文件名+ summary_。这部分是可以的。
For Part 1, I have multiple csv files which I loop through to create new csv files with just summary statistics (medians). The new csv files have the original filename + 'summary_' at the start. This part is okay.
对于第2部分,我想串联所有 summary_文件(它们具有相同的列名),但是具有串联数据框中的行名与数据来源的相应 summary_ csv文件的名称相同。
For Part 2, I want to concatenate all of the 'summary_' files (they have the same column names as each other), but have the row names in the concatenated dataframe the same as the name of the respective 'summary_' csv file where the data comes from.
在stackoverflow的帮助下,我已经解决了第1部分,但还没有解决第2部分。我可以串联所有的csv文件,而不仅仅是名称中带有'summary_'的文件(即在第1部分中创建的新csv),而不是具有正确的行名...
With stackoverflow's help, I have solved Part 1, but not Part 2 yet. I can concatenate all of the csv files, but not just the ones with 'summary_' in the name (i.e. the new csv's created in Part 1), and not with the correct row names...
import os
import pandas as pd
import glob
## Part 1
summary_stats = ['median']
filenames = (filename for filename in os.listdir(os.curdir) if os.path.splitext(filename)[1] == '.csv')
for filename in filenames:
df = pd.read_csv(filename, )
summary_df = df.agg(summary_stats)
summary_df.to_csv(f'summary_{filename}')
## Part 2
path = r'/Users/Desktop/Practice code'
all_files = glob.glob(path + "/*.csv")
list = []
for filename in all_files:
df = pd.read_csv(filename, index_col=None, header=0)
list.append(df)
frame = pd.concat(list, axis=0, ignore_index=True)
推荐答案
-
请确保
all_files
仅加载文件
匹配 summary _ *。csvPlease make sure that the
all_files
is only loading the files matching "summary_*.csv"然后,您可以使用
df.append()因此您的代码可能看起来像这样
So your code might look something like this
path = r'/Users/Desktop/Practice code' all_files = glob.glob(path + "/summary_*.csv") summary_df = None for filename in all_files: df = pd.read_csv(filename, index_col=None, header=0) df['row'] = filename.split('summary_')[1].split('.csv')[0] df.set_index('row') if summary_df is None: summary_df = df else: summary_df = summary_df.append(df)
这篇关于如何将多个csv文件串联到一个以行名作为文件名的pandas数据框中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!