删除整行的重复项 [英] Removing Duplicates of Entire Rows
问题描述
大家好,
我有数千行,有106列。第一列(染色体和位置)只包含染色体和位置,但可以复制,而其余列的范围为1-105,其中它对应于样品编号。如果样本具有某个染色体和位置,那么我想在该单元格中添加第一个,以便最后我将计算其中包含一个样本的每个样本的总和。我难以在Python中编程的问题是,如果相同的键出现在不同的样本中不止一次,我该如何将其写入文件。如何将第一个添加到该单元格中,以便稍后我可以获得总和。
提前多多谢谢,
到目前为止我的代码如下:
< span class =code-keyword> with open(os.path.join(file_out + .txt ),' w') as outpt:
dic = defaultdict(list)
dic [chro_pos] .append(sample_num)
outpt.write( chrom_pos + \t + \t .join( samp_num)+ \t + \ n)
for k,val in dic.iteritems():# k是染色体:位置。 val是样本编号1 out 105
v in val:
outpt_TSS.write(int(k)*( \t)+ str( 1 )+ ' \ n' )# 这将有重复的chrome_pos,我不希望这样,我想要一个chrome_pos,其编号对应多个样本。
将val写入新数组,然后验证该列表中是否已存在,然后跳过。
Hi guys,
I have a thousands rows with 106 columns. The first column (chromosome and location) just contains a chromosome and location but can be duplicated whereas the rest of the columns range from 1-105 in which it correspond to the sample number. If the sample has a certain chromosome and location then, I want to add the number one to that cell so that at the end I will calculate the sum of each sample that has one in it. The problem I am having tough time to program in Python is how can I write this to a file if the same key appear more than once of different sample. How can I add the number one to that cell so I can get the sum later on.
Thanks a lot in advance,
The code I have so far is found below:
with open(os.path.join(file_out+".txt"),'w') as outpt:
dic = defaultdict(list)
dic[chro_pos].append(sample_num)
outpt.write("chrom_pos"+"\t"+"\t".join(samp_num)+ "\t"+"\n")
for k ,val in dic.iteritems(): # k is the chromosome:location. val is the sample number 1 out 105
for v in val:
outpt_TSS.write(int(k)*("\t")+ str(1)+'\n') # This will have duplicates chrome_pos and I don't want that, I want one chrome_pos with number ones corresponding to multiple samples.
write val to a new array and with next, verify if already exist in that list then skip.
这篇关于删除整行的重复项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!