字符串缓存.内存优化和重用 [英] String caching. Memory optimization and re-use

查看:43
本文介绍了字符串缓存.内存优化和重用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在处理一个非常大的旧版应用程序,该应用程序处理从各种来源(IE,名称,标识符,与业务相关的通用代码等)收集的大量字符串数据.在申请过程中,仅此一项数据就可能需要200 mcg的内存.

I am currently working on a very large legacy application which handles a large amount of string data gathered from various sources (IE, names, identifiers, common codes relating to the business etc). This data alone can take up to 200 meg of ram in the application process.

我的一位同事提到了一种减少内存占用的可行策略(因为许多单独的字符串在数据集之间重复),将是在字典中缓存"重复出现的字符串,并在出现以下情况时重新使用它们必需的.例如...

A colleague of mine mentioned one possible strategy to reduce the memory footprint (as a lot of the individual strings are duplicate across the data sets), would be to "cache" the recurring strings in a dictionary and re-use them when required. So for example…

public class StringCacher()
{
    public readonly Dictionary<string, string> _stringCache;

    public StringCacher()
    {
        _stringCache = new Dictionary<string, string>();
    }   

    public string AddOrReuse(string stringToCache)
    {
        if (_stringCache.ContainsKey(stringToCache)
            _stringCache[stringToCache] = stringToCache;

        return _stringCache[stringToCache];
    }
}

然后使用此缓存...

Then to use this caching...

public IEnumerable<string> IncomingData()
{
    var stringCache = new StringCacher();

    var dataList = new List<string>();

    // Add the data, a fair amount of the strings will be the same.
    dataList.Add(stringCache.AddOrReuse("AAAA"));
    dataList.Add(stringCache.AddOrReuse("BBBB"));
    dataList.Add(stringCache.AddOrReuse("AAAA"));
    dataList.Add(stringCache.AddOrReuse("CCCC"));
    dataList.Add(stringCache.AddOrReuse("AAAA"));

    return dataList;
}

由于字符串是不可变的,框架做了很多内部工作,使它们以类似于值类型的方式工作,所以我半想这只会在字符串中创建每个字符串的副本,并且而不是仅仅传递对字典中存储的字符串的引用(这就是我的同事所假定的),使所使用的内存量增加一倍.

As strings are immutable and a lot of internal work is done by the framework to make them work in a similar way to value types i'm half thinking that this will just create a copy of each the string into the dictionary and just double the amount of memory used rather than just pass a reference to the string stored in the dictionary (which is what my colleague is assuming).

因此考虑到它将在大量字符串数据上运行...

So taking into account that this will be run on a massive set of string data...

  • 假设30%的字符串值将被使用两次或更多次,这是否会节省内存?

  • Is this going to save any memory, assuming that 30% of the string values will be used twice or more?

是否可以正常工作?

推荐答案

从本质上讲,这是字符串实习,只是您不必担心它的工作原理.在您的示例中,您仍在创建一个字符串,然后进行比较,然后将副本丢弃..NET将在运行时为您完成此操作.

This is essentially what string interning is, except you don't have to worry how it works. In your example you are still creating a string, then comparing it, then leaving the copy to be disposed of. .NET will do this for you in runtime.

另请参见 String.Intern 优化C#字符串性能(C Calvert)

如果使用第(18)行和第(19)行所示的代码创建新字符串(( String goober1 ="foo"; String goober2 ="foo"; ),则将检查内部表.如果您的字符串已经在其中,则这两个变量都将指向由内部表维护的同一块内存.

If a new string is created with code like (String goober1 = "foo"; String goober2 = "foo";) shown in lines 18 and 19, then the intern table is checked. If your string is already in there, then both variables will point at the same block of memory maintained by the intern table.

因此,您不必自己动手-不会真正提供任何优势.编辑:不需要:您的字符串通常不如您的AppDomain生存那么长-内置的字符串在AppDomain的生存期内生存,这对于GC来说不一定很棒.如果您需要短命的字符串,则需要一个池.来自 String.Intern :

So, you don't have to roll your own - it won't really provide any advantage. EDIT UNLESS: your strings don't usually live for as long as your AppDomain - interned strings live for the lifetime of the AppDomain, which is not necessarily great for GC. If you want short lived strings, then you want a pool. From String.Intern:

如果要减少应用程序分配的内存总量,请记住,对字符串进行intern有两个有害的副作用.首先,在公共语言运行库(CLR)终止之前,不太可能释放为内部String对象分配的内存.原因是在应用程序甚至应用程序域终止后,CLR对内联String对象的引用可以保留....

If you are trying to reduce the total amount of memory your application allocates, keep in mind that interning a string has two unwanted side effects. First, the memory allocated for interned String objects is not likely be released until the common language runtime (CLR) terminates. The reason is that the CLR's reference to the interned String object can persist after your application, or even your application domain, terminates. ...

编辑2 另请参见 Jon Skeets可以在此处回答

这篇关于字符串缓存.内存优化和重用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆