字符串VS字节数组,性能 [英] String vs byte array, Performance

查看:148
本文介绍了字符串VS字节数组,性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

(这个帖子是关于高频型编程)



我最近在一个论坛上(我认为他们在讨论的Java)看到,如果你要分析很多字符串数据它能够更好地使用一个字节数组比分割字符串()。确切的职位是:




一个性能把戏用任何语言,C ++,Java的工作,C#为
,来避免对象创建。它不是分配或GC的成本,其
中的费用来访问不配合在CPU高速缓存大存储器阵列。



现代CPU的得多的速度比他们的记忆。他们拖延每个高速缓存错过许多,
许多周期。大多数CPU transister预算是分配来降低这个大缓存和大量的蜱



GPU的通过让很多准备线程不同的解决问题
执行隐藏存储器访问等待时间,并有很少或没有高速缓存和
花晶体管上多个核



因此,例如,而不是使用字符串的拆分和分析一个
消息,使用可以就地更新字节数组。你真的想
,以避免在大数据结构的随机内存访问,至少在
内部循环。




他只是说:不使用字符串,因为他们是一个对象,创建对象是代价高昂的?抑或是他在说什么东西?



是否使用一个字节数组保证只要数据保留在缓存中尽可能?
当您使用一个字符串是太大,超出了CPU缓存举行?
通常情况下,使用原始数据类型编写更快的代码的最佳方法是什么?


解决方案

他说,如果你打破一大块文本到独立字符串对象,这些字符串对象有更坏的本地的比大阵文字。每串,它包含,将是别处在存储器中的字符阵列;他们可以到处传播。它很可能是该存储器缓存则要鞭打进出作为处理该数据以访问各种串。与此相反,在一个大的阵列具有尽可能最好的局部性,因为所有的数据在存储器的一个区域,和高速缓存系统颠簸将保持在最低限度。



是有限制的这一点,当然:如果文本是非常非常大的,你只需要分析出它的一部分,然后那几个小弦可能更适合在高速缓存比大块文本。

(This post is regarding High Frequency type programming)

I recently saw on a forum (I think they were discussing Java) that if you have to parse a lot of string data its better to use a byte array than a string with a split(). The exact post was:

One performance trick to working with any language, C++, Java, C# is to avoid object creation. It's not the cost of allocation or GC, its the cost to access large memory arrays that dont fit in the CPU cache.

Modern CPU's are much faster than their memory. They stall for many, many cycles for each cache miss. Most of the CPU transister budget is allocated to reduce this with large caches and lots of ticks.

GPU's solve the problem differently by having lots of threads ready to execute to hide memory access latency and have little or no cache and spend the transistors on more cores.

So, for example, rather than using String's and split to parse a message, use byte arrays that can be updated in place. You really want to avoid random memory access over large data structures, at least in the inner loops.

Is he just saying "dont use strings because they're an object and creating objects is costly" ? Or is he saying something else?

Does using a byte array ensure the data remains in the cache for as long as possible? When you use a string is it too large to be held in the CPU cache? Generally, is using the primitive data types the best methods for writing faster code?

解决方案

He's saying that if you break a chunk text up into separate string objects, those string objects have worse locality than the large array of text. Each string, and the array of characters it contains, is going to be somewhere else in memory; they can be spread all over the place. It is likely that the memory cache will have to thrash in and out to access the various strings as you process the data. In contrast, the one large array has the best possible locality, as all the data is on one area of memory, and cache-thrashing will be kept to a minimum.

There are limits to this, of course: if the text is very, very large, and you only need to parse out part of it, then those few small strings might fit better in the cache than the large chunk of text.

这篇关于字符串VS字节数组,性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆