如何访问HashSet< TValue>的参考值没有枚举? [英] How to access the reference values of a HashSet<TValue> without enumeration?

查看:123
本文介绍了如何访问HashSet< TValue>的参考值没有枚举?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这种情况,内存保护至关重要。我试图将> 1 GB的肽序列读入记忆和组肽实例,共享相同的序列。我正在将Peptide对象存储在Hash中,因此我可以快速检查重复,但发现您无法访问Set中的对象,即使知道Set包含该对象。



内存真的很重要,如果可能,我不想重复数据。 (否则我将设计我的数据结构如下:peptides = Dictionary< string,Peptide> ,但会复制字典和Peptide类中的字符串)。以下是向您显示我想要完成的代码:

  public SomeClass {

/ /主要存储所有肽实例,下面提供的类别
private HashSet< Peptide> peptide = new HashSet< Peptide>();

public void SomeMethod(IEnumerable< string> files){
foreach(文件中的字符串文件){
using(PeptideReader reader = new PeptideReader(file)){
foreach(DataLine line in reader.ReadNextLine()){
Peptide testPep = new Peptide(line.Sequence);
if(peptides.Contains(testPep)){

// **问题在这里**
//我想要获取HashSet $ b中的Peptide对象$ b //所以我可以添加DataLine,我不想使用
// testPep对象(即使它们被认为是相等)
peptides [testPep] .Add(line ); //我知道这不工作

testPep.Add(line)//这是没有好的,因为它不会保存在我用于其他方法的HashSet中。

} else {
// HashSet不包含这个肽,所以我们可以添加
testPep.Add(line);
peptides.Add(testPep);
}
}
}
}
}
}

public Peptide:IEquatable< Peptide> {
public string Sequence {get; private set;}
private int hCode = 0;

public PsmList PSMs {get; set;}

public Peptide(string sequence){
Sequence = sequence.Replace('I','L' ;
hCode = Sequence.GetHashCode();
}

public void Add(DataLine data){
if(PSMs == null){
PSMs = new PsmList();
}
PSMs.Add(data);
}

public override int GethashCode(){
return hCode;
}

public bool Equals(Peptide other){
return Sequence.Equals(other.Sequence);
}
}

public PSMlist:List< DataLine> {//和其他一些不重要的东西}

为什么 HashSet 不让我得到包含在HashSet中的对象引用?我知道人们会尝试说如果 HashSet.Contains()返回true,你的对象是等效的。它们在值方面可能是等同的,但是由于我在Peptide类中存储附加信息,因此我需要相同的引用。



我想出了唯一的解决方案是 Dictionary< Peptide,Peptide> ,其中的关键和价值点同一参考。但这似乎很粘。有没有其他数据结构来实现这个?

解决方案

基本上你可以重新实现 HashSet< T> code>你自己,但这是关于我唯一的解决方案。 c< Peptide> Dictionary< string,Peptide> 解决方案可能不是<尽管如此,如果你只是浪费一个参考文献,我会想到这将是相对微不足道的。



事实上,如果你删除 hCode 成员从 Peptide ,这将保证您每个对象的4个字节与x86的参考大小相同。根据我可以告诉你的哈希值没有任何意义,因为你只会计算每个对象的哈希值,至少在你所显示的代码中。



如果你真的绝望的记忆,我怀疑你可以比一个字符串更有效地存储序列。如果你给我们更多关于序列包含的信息,我们可以在那里提出一些建议。



我不知道有什么特别强的原因为什么 HashSet 不允许这个,除了这是一个相对罕见的要求,但这是我在Java中所要求的内容...


I have this scenario in which memory conservation is paramount. I am trying to read in > 1 GB of Peptide sequences into memory and group peptide instances together that share the same sequence. I am storing the Peptide objects in a Hash so I can quickly check for duplication, but found out that you cannot access the objects in the Set, even after knowing that the Set contains that object.

Memory is really important and I don't want to duplicate data if at all possible. (Otherwise I would of designed my data structure as: peptides = Dictionary<string, Peptide> but that would duplicate the string in both the dictionary and Peptide class). Below is the code to show you what I would like to accomplish:

public SomeClass {

       // Main Storage of all the Peptide instances, class provided below
       private HashSet<Peptide> peptides = new HashSet<Peptide>();

       public void SomeMethod(IEnumerable<string> files) {
            foreach(string file in files) {
                 using(PeptideReader reader = new PeptideReader(file)) {
                     foreach(DataLine line in reader.ReadNextLine()) {
                         Peptide testPep = new Peptide(line.Sequence);
                         if(peptides.Contains(testPep)) {

                            // ** Problem Is Here **
                            // I want to get the Peptide object that is in HashSet
                            // so I can add the DataLine to it, I don't want use the
                            // testPep object (even though they are considered "equal")
                            peptides[testPep].Add(line); // I know this doesn't work

                            testPep.Add(line) // THIS IS NO GOOD, since it won't be saved in the HashSet which i use in other methods.

                         } else {
                            // The HashSet doesn't contain this peptide, so we can just add it
                            testPep.Add(line);
                            peptides.Add(testPep);
                         }
                     }   
                 }
            }
       }
}

public Peptide : IEquatable<Peptide> {
     public string Sequence {get;private set;}
     private int hCode = 0;

     public PsmList PSMs {get;set;}

     public Peptide(string sequence) {
         Sequence = sequence.Replace('I', 'L');
         hCode = Sequence.GetHashCode();             
     }

     public void Add(DataLine data) {
         if(PSMs == null) {
             PSMs = new PsmList();
         } 
         PSMs.Add(data);
     }

     public override int GethashCode() {
         return hCode;
     }

     public bool Equals(Peptide other) {
         return Sequence.Equals(other.Sequence);
     }
}

public PSMlist : List<DataLine> { // and some other stuff that is not important }

Why does HashSet not let me get the object reference that is contained in the HashSet? I know people will try to say that if HashSet.Contains() returns true, your objects are equivalent. They may be equivalent in terms of values, but I need the references to be the same since I am storing additional information in the Peptide class.

The only solution I came up with is Dictionary<Peptide, Peptide> in which both the key and value point to the same reference. But this seems tacky. Is there another data structure to accomplish this?

解决方案

Basically you could reimplement HashSet<T> yourself, but that's about the only solution I'm aware of. The Dictionary<Peptide, Peptide> or Dictionary<string, Peptide> solution is probably not that inefficient though - if you're only wasting a single reference per entry, I would imagine that would be relatively insignificant.

In fact, if you remove the hCode member from Peptide, that will safe you 4 bytes per object which is the same size as a reference in x86 anyway... there's no point in caching the hash as far as I can tell, as you'll only compute the hash of each object once, at least in the code you've shown.

If you're really desperate for memory, I suspect you could store the sequence considerably more efficiently than as a string. If you give us more information about what the sequence contains, we may be able to make some suggestions there.

I don't know that there's any particularly strong reason why HashSet doesn't permit this, other than that it's a relatively rare requirement - but it's something I've seen requested in Java as well...

这篇关于如何访问HashSet&lt; TValue&gt;的参考值没有枚举?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆