删除整行的重复项 [英] Removing Duplicates of Entire Rows

查看：90 发布时间：2019/6/11 17:51:25 Python2.7

本文介绍了删除整行的重复项的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

大家好，

我有数千行，有106列。第一列（染色体和位置）只包含染色体和位置，但可以复制，而其余列的范围为1-105，其中它对应于样品编号。如果样本具有某个染色体和位置，那么我想在该单元格中添加第一个，以便最后我将计算其中包含一个样本的每个样本的总和。我难以在Python中编程的问题是，如果相同的键出现在不同的样本中不止一次，我该如何将其写入文件。如何将第一个添加到该单元格中，以便稍后我可以获得总和。

提前多多谢谢，

到目前为止我的代码如下：

 < span class =code-keyword> with  open（os.path.join（file_out +   .txt ），'  w'） as  outpt：
   
 dic = defaultdict（list）
 dic [chro_pos] .append（sample_num）
 outpt.write（  chrom_pos +   \t +   \t .join（ samp_num）+   \t +   \ n） 
  for  k，val  in  dic.iteritems（）：＃  k是染色体：位置。 val是样本编号1 out 105  
   v  in  val：
 outpt_TSS.write（int（k）*（  \t）+ str（  1 ）+ '  \ n' ）＃ 这将有重复的chrome_pos，我不希望这样，我想要一个chrome_pos，其编号对应多个样本。

解决方案

将val写入新数组，然后验证该列表中是否已存在，然后跳过。

Hi guys,

I have a thousands rows with 106 columns. The first column (chromosome and location) just contains a chromosome and location but can be duplicated whereas the rest of the columns range from 1-105 in which it correspond to the sample number. If the sample has a certain chromosome and location then, I want to add the number one to that cell so that at the end I will calculate the sum of each sample that has one in it. The problem I am having tough time to program in Python is how can I write this to a file if the same key appear more than once of different sample. How can I add the number one to that cell so I can get the sum later on.

Thanks a lot in advance,

The code I have so far is found below:

 with open(os.path.join(file_out+".txt"),'w') as outpt:

 dic = defaultdict(list)
 dic[chro_pos].append(sample_num)
  outpt.write("chrom_pos"+"\t"+"\t".join(samp_num)+ "\t"+"\n")
  for k ,val in dic.iteritems():      # k is the chromosome:location. val is the sample number 1 out 105
    for  v in val:     
        outpt_TSS.write(int(k)*("\t")+ str(1)+'\n')   # This will have duplicates chrome_pos and I don't want that, I want one chrome_pos with number ones corresponding to multiple samples.

解决方案

write val to a new array and with next, verify if already exist in that list then skip.

这篇关于删除整行的重复项的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

删除整行的重复项 [英] Removing Duplicates of Entire Rows

问题描述

相关文章

其他开发语言最新文章

热门教程

热门工具

登录关闭

删除整行的重复项 [英] Removing Duplicates of Entire Rows

问题描述

相关文章

其他开发语言最新文章

热门教程

热门工具

登录 关闭

登录关闭