读取.bed文件并以特定格式压缩输出时出现问题 [英] Issues with reading .bed files and compressing output in a specific format

查看:146
本文介绍了读取.bed文件并以特定格式压缩输出时出现问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

import os #handles the gzipped output like the example file
file_name = "exampleziptotxt.bed"
out_file_root = "example_by_chrom"
file_handle_dict = {}

file_reader=open(file_name)

for line in file_reader:
    ff=line.strip().split(",")
    chrom_name=ff[0]
    if not (chrom_name in file_handle_dict):
        out_file_chrom_name=out_file_root+"."+chrom_name+".bed"
        out_file_chrom_name_handle=open(out_file_chrom_name,"w")
        file_handle_dict[chrom_name]=out_file_chrom_name_handle
    # write the line in the appropriate output file
    file_handle_dict[chrom_name].write(line)
    # file_handle_dict[chrom_name].write("%s\n"%"\t".join(ff))

file_reader.close()

# now close all open files
for chrom_name in file_handle_dict:
    file_handle_dict[chrom_name].close()

我想重写上面的代码,以便使用gzip或替代方法输出多个gzip压缩文件.我不确定如何做到这一点.任何帮助将不胜感激.

I would like to rewrite the above code so that the output is multiple gzipped files using gzip or alternative. I am unsure how to accomplish this. Any help would be greatly appreciated.

推荐答案

只需进行一些小更改,import gzip(不需要os),将.gz添加到名称中,使用gzip.open而不是open进行写入,并在该行上使用.encode()使其变为字节.

Just a few small changes are needed, import gzip (don't need os), add .gz to the name, use gzip.open instead of open for the write, and use .encode() on the line to make it into bytes.

import gzip

file_name = "exampleziptotxt.bed"
out_file_root = "example_by_chrom"
file_handle_dict = {}

file_reader=open(file_name)

for line in file_reader:
    ff=line.strip().split(",")
    chrom_name=ff[0]
    if not (chrom_name in file_handle_dict):
        out_file_chrom_name=out_file_root+"."+chrom_name+".bed.gz"
        out_file_chrom_name_handle=gzip.open(out_file_chrom_name,"w")
        file_handle_dict[chrom_name]=out_file_chrom_name_handle
    # write the line in the appropriate output file
    file_handle_dict[chrom_name].write(line.encode())
    # file_handle_dict[chrom_name].write("%s\n"%"\t".join(ff))

file_reader.close()

# now close all open files
for chrom_name in file_handle_dict:
    file_handle_dict[chrom_name].close()

这篇关于读取.bed文件并以特定格式压缩输出时出现问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆