通过外部表中的相应列联接文件 [英] Joining files by corresponding columns in outside table

查看:73
本文介绍了通过外部表中的相应列联接文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个.csv文件,该文件将表名与类别进行匹配,我想使用该文件将文件夹中的任何文件(与cat一样)进行合并,并根据类别将名称与.csv中的Sample_Name列对应,以更改最终文件的每个类别的名称.

I have a .csv file matching table names to categories, which I want to use to merge any files in a folder (as in cat) with names corresponding to column Sample_Name in the .csv according to Category, changing the final file's name to each Category.

文件夹中要合并的文件不是.csv;它们是一种.fasta文件.

The to-be merged files in the folder are not .csv; they're a kind of .fasta file.

.csv如下所示(将有更多列被忽略):

The .csv is something as the following (will have more columns that will be ignored for this):

 Sample_Name     Category
 1               a
 2               a
 3               a
 4               b
 5               b

合并后,输出应为两个文件:a(合并的样本1,2,3)和b(样本4和5).

After merging, the output should be two files: a (samples 1,2,3 merged) and b (samples 4 and 5).

其目的是使此功能适用于大量文件和类别.

The idea is to make this work for a large number of files and categories.

感谢您的帮助!

推荐答案

假设文件在输入CSV文件中排列整齐,那么您将获得尽可能简单的信息:

Assuming that the files are in order in the input CSV file, this is about as simple as you could get:

from operator import itemgetter

fields = itemgetter(0, 1)    # zero-based field numbers of the fields of interest
with open('sample_categories.csv') as csvfile:
    next(csvfile)     # skip over header line
    for line in csvfile:
        filename, category = fields(line.split())
        with open(filename) as infile, open(category, 'a') as outfile:
            outfile.write(infile.read())

这样做的一个缺点是为每个输入文件重新打开了输出文件.如果每个类别中有很多文件,这可能是一个问题.如果这确实是一个实际问题,那么您可以尝试一下,只要该类别中有输入文件,它就可以保持打开输出文件的状态.

One downside to this is that the output file is reopened for every input file. This might be a problem if there are a lot of files per category. If that works out to be an actual problem then you could try this, which holds the output file open for as long as there are input files in that category.

from operator import itemgetter

fields = itemgetter(0, 1)    # zero-based field numbers of the fields of interest
with open('sample_categories.csv') as csvfile:
    next(csvfile)     # skip over header line
    current_category = None
    outfile = None
    for line in csvfile:
        filename, category = fields(line.split())
        if category != current_category:
            if outfile is not None:
                outfile.close()
            outfile = open(category, 'w')
            current_category = category
        with open(filename) as infile:
            outfile.write(infile.read())

这篇关于通过外部表中的相应列联接文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆