Python用于将目录中的多个文件合并为一个文件 [英] Python for merging multiple files from a directory into one single file
问题描述
我需要一个包含多列的单个文件(=目录中的文件数),来自目录中的多个文件.每个文件都有唯一的 ID,所有文件都不会更改,因此我需要合并这些文件在那个 ID 上.
I need a single file with many columns(=number of files in the directory), from multiple file in the directory.. Each files has unique IDs which will not change for all files and so I need to merge these files based on that id.
例如,file_1 看起来像这样
For example, file_1 looks like this
id pool1
ABL1 1352
ABL12 1236
ABL13 1022
ABL14 815
ABL15 1591
ABL16 2703
因此,对于目录中的所有其他文件,第一列与其他文件相同,第二列不同.
And so as the other files the first column is same for all other files in the directory and second columns are different.
我正在寻找一个看起来像这样的输出,
I am looking for a output which looks something like this,
id /pool1 /pool2 /pool3 /pool4 /pool5
ABL1 1352 1353 1354 1355 1356
ABL12 1236 1237 1238 1239 1240
ABL13 1022 1023 1024 1025 1026
ABL14 815 816 817 818 819
ABL15 1591 1592 1593 1594 1595
ABL16 2703 2704 2705 2706 2707
ABL17 1449 1450 1451 1452 1453
ABL18 619 620 621 622 623
ABL19 1074 1075 1076 1077 1078
到目前为止,我试图通过以下脚本在 python 中实现它,
So far I was trying to achieve it in python via following scripts,
path = '/Pool1'
files = os.listdir(path)
files_txt = [i for i in files if i.endswith('.txt_samplecount')]
files_merge= i for i in files_txt if i.merge(i,on="id")
But it throws error as
AttributeError: 'str' object has no attribute 'merge'
欢迎任何帮助或建议
谢谢
推荐答案
我找到了解决方案,
path = '/Pool1'
files = os.listdir(path)
files_txt = [os.path.join(path,i) for i in files if i.endswith('.txt_samplecount')]
## Change it into dataframe
dfs = [pd.DataFrame.from_csv(x, sep='\t') for x in files_txt]
##Concatenate it
merged = pd.concat(dfs, axis=1)
这给出了每列连接到单个文件的输出.谢谢大家的建议
And this gives a output with each columns concatenate to the single file. Thanks for suggestions all
这篇关于Python用于将目录中的多个文件合并为一个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!