如何使用python 2.7将多个csv列合并为一列? [英] How to merge several csv columns into one column using python 2.7?

查看:684
本文介绍了如何使用python 2.7将多个csv列合并为一列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理大量的csv数据,我想将不同位置的几列放到由分号(;)分隔的一列中.

I'm working with large set of csv data and I want to put several columns in different places into one column separated by semi-colon(;).

所以我现在拥有的是..

So what I have now is..

a   b   c   d
1   2   3   4
1   2   3   4
1   2   3   4

我想将其更改为..this,所以我的所有数据都只在d列中.

I want to change this like..this, So all my data is only in column d.

a   b   c   d
            a=1;b=2;c=3;d=4;
            a=1;b=2;c=3;d=4;
            a=1;b=2;c=3;d=4;

我知道如何删除那些空的a,b和c列,但我只是想不出一种将a,b,c列中的数据合并到d列中的方法. 预先感谢.

I know how to delete those empty column a,b and c but I just can't figure out a way to merge the data from column a,b,c into column d. Thanks in advance.

到目前为止,我的代码是..

The code that I have so far is..

# Parsing the custom formatted data with csv module.
# reads the custom format input and spits out the output in VCF format.
import csv
# input and output
with open('1-0002', 'rb') as csvin, open('converted1','wb') as csvout:
    # reading and writing are all tab delimited
    reader = csv.reader(csvin, delimiter = '\t')
    writer = csv.writer(csvout, delimiter = '\t')
    # add headings before the for loop to prevent the heading being affected by column manipulation.
    writer.writerow(["#CHROM","POS","ID","REF","ALT","QUAL","FILTER","INFO"])

    for row in reader:
        # deleting unnecessary columns, 'del' operator must be in ascending order or else it will give range error
        # manually deleting columns since the input data is in custom format.
        del row[11]
        del row[10]
        del row[9]
        del row[8]
        del row[7]
        del row[6]
        del row[5]
        del row[1]
        del row[0]
        # inserting 1 and . in specific columns
        row.insert(0,'1')
        row.insert(2,'.')
        row.insert(5,'.')
        row.insert(7,'') # inserting empty column for INFO headings.

        # change 'YES' to 'PASS' , leaving HETERO as it is.
        if row[6] == 'YES':
                row[6] = 'PASS'

        writer.writerow(row)

因此,从上面的代码中,我想将来自几个不同列的数据放入INFO列.

So from this code above, I want to put the data from several different columns into INFO column.

推荐答案

简单的答案:不用费心删除行,而是在插入的新行中仅选择您想要的内容.

Simple answer: don't bother deleting the row, but make a NEW row for insertion that only picks what you want.

它看起来像这样:

# leave row alone, don't bother deleting columns in it.
new_row = ["a=%s;b=%s;c=%s;d=%s"% (row[12], row[13], row[14])] 
# new_row has only one column, with a string constructed of what you need. 

writer.writerow(new_row)

瞧,那应该为您做.您还可以将所需的任何其他列复制到new_row,并复制append()您可能需要的其他任何列.

And voila, that should do it for you. You can also copy any other columns you need to new_row, and append() whatever else you might desire.

这篇关于如何使用python 2.7将多个csv列合并为一列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆