如何使用python 2.7将多个csv列合并为一列? [英] How to merge several csv columns into one column using python 2.7?
问题描述
我正在处理大量的csv数据,我想将不同位置的几列放到由分号(;)分隔的一列中.
I'm working with large set of csv data and I want to put several columns in different places into one column separated by semi-colon(;).
所以我现在拥有的是..
So what I have now is..
a b c d
1 2 3 4
1 2 3 4
1 2 3 4
我想将其更改为..this,所以我的所有数据都只在d列中.
I want to change this like..this, So all my data is only in column d.
a b c d
a=1;b=2;c=3;d=4;
a=1;b=2;c=3;d=4;
a=1;b=2;c=3;d=4;
我知道如何删除那些空的a,b和c列,但我只是想不出一种将a,b,c列中的数据合并到d列中的方法. 预先感谢.
I know how to delete those empty column a,b and c but I just can't figure out a way to merge the data from column a,b,c into column d. Thanks in advance.
到目前为止,我的代码是..
The code that I have so far is..
# Parsing the custom formatted data with csv module.
# reads the custom format input and spits out the output in VCF format.
import csv
# input and output
with open('1-0002', 'rb') as csvin, open('converted1','wb') as csvout:
# reading and writing are all tab delimited
reader = csv.reader(csvin, delimiter = '\t')
writer = csv.writer(csvout, delimiter = '\t')
# add headings before the for loop to prevent the heading being affected by column manipulation.
writer.writerow(["#CHROM","POS","ID","REF","ALT","QUAL","FILTER","INFO"])
for row in reader:
# deleting unnecessary columns, 'del' operator must be in ascending order or else it will give range error
# manually deleting columns since the input data is in custom format.
del row[11]
del row[10]
del row[9]
del row[8]
del row[7]
del row[6]
del row[5]
del row[1]
del row[0]
# inserting 1 and . in specific columns
row.insert(0,'1')
row.insert(2,'.')
row.insert(5,'.')
row.insert(7,'') # inserting empty column for INFO headings.
# change 'YES' to 'PASS' , leaving HETERO as it is.
if row[6] == 'YES':
row[6] = 'PASS'
writer.writerow(row)
因此,从上面的代码中,我想将来自几个不同列的数据放入INFO列.
So from this code above, I want to put the data from several different columns into INFO column.
推荐答案
简单的答案:不用费心删除行,而是在插入的新行中仅选择您想要的内容.
Simple answer: don't bother deleting the row, but make a NEW row for insertion that only picks what you want.
它看起来像这样:
# leave row alone, don't bother deleting columns in it.
new_row = ["a=%s;b=%s;c=%s;d=%s"% (row[12], row[13], row[14])]
# new_row has only one column, with a string constructed of what you need.
writer.writerow(new_row)
瞧,那应该为您做.您还可以将所需的任何其他列复制到new_row,并复制append()
您可能需要的其他任何列.
And voila, that should do it for you. You can also copy any other columns you need to new_row, and append()
whatever else you might desire.
这篇关于如何使用python 2.7将多个csv列合并为一列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!