合并具有与数据框相似的名称约定的文件 [英] Merge files with similar name convention to a dataframe

查看:65
本文介绍了合并具有与数据框相似的名称约定的文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个存储在目录中的文件列表,例如

I have a list of files stored in directory such as

filenames=[
        abc_1.txt
        abc_2.txt
        abc_3.txt

        bcd_1.txt
        bcd_2.txt
        bcd_3.txt
       ]

pattern=[abc]

我想将多个txt文件读入一个数据帧,以便所有以abc开头的文件都在一个数据帧中,然后所有的文件名都以bcd等开头.

I want to read multiple txt files into one dataframe such that all files starting with abc will be in one dataframe then all all filename starting with bcd etc.

我的代码:

file_path = '/home/iolie/Downloads/test/'
filenames = os.listdir(file_path)


prefixes = list(set(i.split('_')[0] for i in filenames))

for prefix in prefixes:
    print('Reading files with prefix:',prefix)
    for file in filenames: 
        if file.startswith(prefix):
            print('Reading files:',file)
            list_of_dfs = [pd.concat([pd.read_csv(os.path.join(file_path, file), header=None) ],ignore_index=True)]
            final = pd.concat(list_of_dfs)

此代码不会追加,但会覆盖数据框.有人可以帮忙吗?

This code doesnt't append but overwrites the dataframe. Can someone help wih this?

推荐答案

比创建任意数量的未链接数据帧更好的主意是输出数据帧字典,其中键为前缀:

A better idea than creating an arbitrary number of unlinked dataframes is to output a dictionary of dataframes, where the key is the prefix:

from collections import defaultdict

filenames = ['abc_1.txt', 'abc_2.txt', 'abc_3.txt',
             'bcd_1.txt', 'bcd_2.txt', 'bcd_3.txt']

dd = defaultdict(list)

for fn in filenames:
    dd[fn.split('_')[0]].append(fn)

dict_of_dfs = {}
for k, v in dd.items():
    dict_of_dfs[k] = pd.concat([pd.read_csv(fn) for fn in v], ignore_index=True)

这篇关于合并具有与数据框相似的名称约定的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆