如何将多个csv文件串联到一个以行名作为文件名的pandas数据框中? [英] How do I concatenate multiple csv files into a pandas dataframe, with the filenames as the row names?

查看:114
本文介绍了如何将多个csv文件串联到一个以行名作为文件名的pandas数据框中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于第1部分,我有多个csv文件,可以循环使用这些文件来创建仅具有摘要统计信息(中位数)的新csv文件。新的csv文件开头带有原始文件名+ summary_。这部分是可以的。

For Part 1, I have multiple csv files which I loop through to create new csv files with just summary statistics (medians). The new csv files have the original filename + 'summary_' at the start. This part is okay.

对于第2部分,我想串联所有 summary_文件(它们具有相同的列名),但是具有串联数据框中的行名与数据来源的相应 summary_ csv文件的名称相同。

For Part 2, I want to concatenate all of the 'summary_' files (they have the same column names as each other), but have the row names in the concatenated dataframe the same as the name of the respective 'summary_' csv file where the data comes from.

在stackoverflow的帮助下,我已经解决了第1部分,但还没有解决第2部分。我可以串联所有的csv文件,而不仅仅是名称中带有'summary_'的文件(即在第1部分中创建的新csv),而不是具有正确的行名...

With stackoverflow's help, I have solved Part 1, but not Part 2 yet. I can concatenate all of the csv files, but not just the ones with 'summary_' in the name (i.e. the new csv's created in Part 1), and not with the correct row names...


import os
import pandas as pd
import glob

## Part 1

summary_stats = ['median']

filenames = (filename for filename in os.listdir(os.curdir) if os.path.splitext(filename)[1] == '.csv')

for filename in filenames:
    df = pd.read_csv(filename, )

    summary_df = df.agg(summary_stats)
    summary_df.to_csv(f'summary_{filename}')

## Part 2

path = r'/Users/Desktop/Practice code'
all_files = glob.glob(path + "/*.csv")

list = []

for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0)
    list.append(df)

frame = pd.concat(list, axis=0, ignore_index=True)


推荐答案


  • 请确保 all_files 仅加载文件
    匹配 summary _ *。csv

    • Please make sure that the all_files is only loading the files matching "summary_*.csv"

      然后,您可以使用
      df.append()

      因此您的代码可能看起来像这样

      So your code might look something like this

      path = r'/Users/Desktop/Practice code'
      all_files = glob.glob(path + "/summary_*.csv")
      
      summary_df = None
      
      for filename in all_files:
          df = pd.read_csv(filename, index_col=None, header=0)
          df['row'] = filename.split('summary_')[1].split('.csv')[0]
          df.set_index('row')
      
          if summary_df is None:
              summary_df = df
          else:
              summary_df = summary_df.append(df)
      

      这篇关于如何将多个csv文件串联到一个以行名作为文件名的pandas数据框中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆