C#中的内存不足异常 [英] Out of Memory Exception In C#

查看:747
本文介绍了C#中的内存不足异常的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试构造后缀trie,由于严格的要求,必须在内存中对其进行索引.

I'm attempting to construct a suffix trie, and due to strict requirements it must be indexed in memory.

问题不在于树本身,而是实际上我在读取文件的方式.

The problem is not the tree itself, but actually the way I was reading the file.

推荐答案

如果将整个文本文件作为单个string传递,则在第一个循环中很容易遇到内存不足异常!

If you're passing the entire text file as a single string you could easily run into an out of memory exception with your first loop!

// imagine if s.Length was 100k or so
for (int i = 0; i < s.Length; i++)
{
    AddString(s.Substring(i, s.Length-i));
}

在读取文件以构造Trie时,您需要分割每一行并可能将字符标准化:

When reading the file to construct the trie, you'll need to split each line and probably normalize the characters:

string line;
while (null != (line = reader.ReadLine()))
{
    string[] parts = line.Split(' ', ',', '.', '!', '\t', '?'); // naive
    foreach (string part in parts)
    {
        if (part.Length > 0)
        {
            // make each string uppercase so as to avoid Hello and hello being
            // two trie entries
            trie.AddSuffix(part.ToUpperInvariant());
        }
    }
}

例如(在dir /b c:\windows的输出上):

A
 D
  D
   I
    N
     S
  E
   D
 P
  P
   C
    O
     M
      P
       A
        T
   P
    A
     T
      C
       H
...


要适当地处理较大的文件,将需要更紧凑的特里结构.我只是将未共享的后缀存储在单独的字典中:


To appropriately handle larger files, a more compact trie structure would be desirable. I would just have unshared suffixes stored in a separate dictionary:

// If you add a character, but there is no entry in m_children
// just park the tail end of it here
Dictionary<char, string> m_tails;

然后,将每个字符的逻辑移到SuffixNodeAddString中:

You would then move the per character logic to your AddString of the SuffixNode:

public void AddString(string s)
{
    if (s.Length == 0) return;

    char c = s[0];
    if (m_children.ContainsKey(c))
    {
        if (s.Length > 1) m_children[c].AddString(s.Substring(1));
    }
    else if (m_tails.ContainsKey(c))
    {
        SuffixNode node = new SuffixNode();
        node.AddString(m_tails[c]);
        if (s.Length > 1) node.AddString(s.Substring(1));

        m_children.Add(c, node);
        m_tails.Remove(c);
    }
    else
    {
        m_tails.Add(c, s.Length > 1 ? s.Substring(1) : "");
    }
}

现在,您有一个更加紧凑的trie版本,它将大大减少为任何给定语料库创建的子SuffixNode的数量.回到dir /b c:\windows示例,我们可以看到节点的实际减少:

Now you have a much more compact version of the trie, which will greatly decrease the number of child SuffixNodes created for any given corpus. Returning to the dir /b c:\windows example, we can see a practical reduction in nodes:

A
 P
  P
   COMPAT
   PATCH
  I
 T
  I
   O
    N
     S
...

在这一点上,您的特里有一个更有效的表示.您需要确定如何处理终端节点表示形式,以确保查找准确.

At this point your trie has a more efficient representation. You're left with determining how to deal with terminal node representations in order to ensure lookups are accurate.

这篇关于C#中的内存不足异常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆