删除整行的重复项 [英] Removing Duplicates of Entire Rows

查看:90
本文介绍了删除整行的重复项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好,



我有数千行,有106列。第一列(染色体和位置)只包含染色体和位置,但可以复制,而其余列的范围为1-105,其中它对应于样品编号。如果样本具有某个染色体和位置,那么我想在该单元格中添加第一个,以便最后我将计算其中包含一个样本的每个样本的总和。我难以在Python中编程的问题是,如果相同的键出现在不同的样本中不止一次,我该如何将其写入文件。如何将第一个添加到该单元格中,以便稍后我可以获得总和。



提前多多谢谢,



到目前为止我的代码如下:





 < span class =code-keyword> with  open(os.path.join(file_out +   .txt ),'  w' as  outpt:

dic = defaultdict(list)
dic [chro_pos] .append(sample_num)
outpt.write( chrom_pos + \t + \t .join( samp_num)+ \t + \ n
for k,val in dic.iteritems(): k是染色体:位置。 val是样本编号1 out 105
v in val:
outpt_TSS.write(int(k)*( \t)+ str( 1 )+ ' \ n' 这将有重复的chrome_pos,我不希望这样,我想要一个chrome_pos,其编号对应多个样本。

解决方案

将val写入新数组,然后验证该列表中是否已存在,然后跳过。

Hi guys,

I have a thousands rows with 106 columns. The first column (chromosome and location) just contains a chromosome and location but can be duplicated whereas the rest of the columns range from 1-105 in which it correspond to the sample number. If the sample has a certain chromosome and location then, I want to add the number one to that cell so that at the end I will calculate the sum of each sample that has one in it. The problem I am having tough time to program in Python is how can I write this to a file if the same key appear more than once of different sample. How can I add the number one to that cell so I can get the sum later on.

Thanks a lot in advance,

The code I have so far is found below:


 with open(os.path.join(file_out+".txt"),'w') as outpt:

 dic = defaultdict(list)
 dic[chro_pos].append(sample_num)
  outpt.write("chrom_pos"+"\t"+"\t".join(samp_num)+ "\t"+"\n")
  for k ,val in dic.iteritems():      # k is the chromosome:location. val is the sample number 1 out 105
    for  v in val:     
        outpt_TSS.write(int(k)*("\t")+ str(1)+'\n')   # This will have duplicates chrome_pos and I don't want that, I want one chrome_pos with number ones corresponding to multiple samples.

解决方案

write val to a new array and with next, verify if already exist in that list then skip.


这篇关于删除整行的重复项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆