ECMA / C#标准的想法 - 为性能编译时间哈希 [英] Idea for ECMA/C# Standard - compile time hash for performance

查看:48
本文介绍了ECMA / C#标准的想法 - 为性能编译时间哈希的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述




很多时候我们有来自不同枚举的一堆枚举或

相同的枚举,它将分配各种数值。很少会在枚举之间发生冲突。

你可能想象的这些枚举就像OrderPrice = 27,OrderQuantity = 50,OrderSide = 62。

可能是很多这些。所以通常我们要做的就是将这些

存储在哈希表中,这样哈希表[OrderPrice] = value可以用值来设置。

哈希是非常昂贵的在我们处理的级别上的性能。

如果C#/ C ++编译器允许我做某事,那将是多么好的

像这样:

下面的示例是比C#更多的C ++伪代码,但是你明白了。


想象一下原始枚举:


enum OrderInformation { 。,OrderPrice = 27,OrderQuantity = 50,OrderSide = 62};


现在想象一下,如果在编译时编译器可以为我们做哈希,如:

enum map LocalOrderInformation {OrderInformation.OrderPrice别名

OrderPrice = 0,OrderInformation.OrderQuantity别名OrderQuantity,

OrderInformation.OrderSide alias OrderSide};


所以基本上这告诉编译器创建一个新枚举,将旧的

枚举映射到新的映射到基于0或基于1,这样现在我们可以使用直接数组查找而不是散列图。


现在我们可以做类似的事情: br />

int OrderInfo [3];


void OrderClass :: ProcessOrder(LocalOrderInformation info,int value)


{


OrderInfo [info] = value;


}


所以如果他们做了这样的事情,则调用者:


OrderClassInstance.ProcessOrder(OrderInformation.O rderPrice,27);


编译器将翻译OrderInformatin.OrderPrice枚举到我们的

该枚举的LocalOrderInformation枚举值为0.现在我们可以做一个

直接数组输入而不是哈希表。


我确定这里有很多问题要解决,并且它不像上面的伪代码那么简单。但是如果它是,我们可以得到

编译器来为我们做这个映射我们的运行时代码会有很大改进。


我肯定在那里是技术障碍和改进语法的需要,但

人们怎么看待这个想法?我认为这将有助于我为了避免哈希表而努力工作。想法,想法更好的语法,不同的

变种?


谢谢,


戴夫




Many times we have a bunch of enums we have from either different enums or
the same enum that will have various numeric values assigned. Rarely will
there be collisions in numbering between the enums. These enums you might
imagine would be like OrderPrice=27, OrderQuantity=50, OrderSide=62. There
may be a lot of these. So normally what we would have to do is store these
in hash table so hashtable[OrderPrice]=value could be set with the value.
Hashing is very expensive in terms of performance at the level we deal with.
What would be nice is if the C#/C++ compiler would allow me to do something
like this:
Below example is more C++ pseudo code than C# but you get the idea.

Imagine an original enum:

enum OrderInformation {., OrderPrice=27, OrderQuantity=50, OrderSide=62};

Now imagine if at compile time the compiler could do the hash for us like:

enum map LocalOrderInformation { OrderInformation.OrderPrice alias
OrderPrice=0, OrderInformation.OrderQuantity alias OrderQuantity,
OrderInformation.OrderSide alias OrderSide};

So basically this tells the compiler to create me a new enum mapping the old
enumeration to the new mapping to be 0 based or 1 based so that now we can
use a direct array lookup rather than a hashmap.

So now we could do something like:

int OrderInfo[3];

void OrderClass::ProcessOrder(LocalOrderInformation info, int value)

{

OrderInfo[info]=value;

}

So that a caller if they did something like this:

OrderClassInstance.ProcessOrder(OrderInformation.O rderPrice,27);

The compiler would translate the OrderInformatin.OrderPrice enum to our
LocalOrderInformation enum value for that enum to be 0. Now we can do a
direct array entry rather than a hashtable.

I''m sure there are numerous issues here to work out and it is not quite as
simple as the pseudo code above. But if it were and we could get the
compiler to do this mapping for us our runtime code would be much improved.

I''m sure there are technical hurdles and the need for improved syntax, but
what do people think of the idea? I think it would help a lot of stuff I
work on to avoid hashtables. Thoughts, ideas for better syntax, different
variations?

Thanks,

Dave


推荐答案

如果您的原始枚举类型的值不超过四位数或

更多,为什么不只是直接使用一个数组,即使它是一个稀疏的数组?

它在具有500 MB内存的机器上有什么不同,如果

你浪费80 100个数组中的条目?

为了节省这么小的话,似乎几乎不值得改变语言(或者用于

的哈希表)记忆量。 (哈希表

可能会消耗尽可能多的开销。)

If your original enum type does not have values up into four digits or
more, why not just use an array directly, even if it''s a sparse array?
What difference does it make, on a machine with 500 MB of memory, if
you waste 80 entries in an array of 100?

It seems hardly worth a change to the language (or a hash table for
that matter) to save such a small amount of memory. (The hash table
would probably consume as much overhead as that anyway.)


我给出的例子就是例子和现实是那些枚举

几乎可以是整数范围内的任何东西,所以稀疏数组真的不是一个回答,特别是分配整个数组。如果以查找树的形式完成

哈希或地图需要查找时间,这在金融行业中我工作

与数千条市场数据消息(包含大量数据)在每一个),或者更多的时候,它变得越来越不可行(尽管增加的处理器速度有助于缓解一点 - 当客户愿意

升级)。


关键是我不想燃烧大量内存而且我想招致

尽可能少的查找时间。在这种情况下,

处理这个问题的唯一方法就是我相信语言的变化。因为在编译时使用编译器

进行映射可以使我免于在运行时刻录大量内存和几乎任何查找时间。

谢谢,

Dave

" Bruce Wood"写道:
The example I gave was just that an example and the reality is those enums
can be almost anything in an integer range so sparse array really isn''t an
answer, especially allocating the whole array. And if done as a lookup tree
hash or map requires that lookup time, which in the financial industry I work
in with thousands of market data messages (with lots of data in each), or
more comming in, it is becoming less and less viable by the day (although
increasing processor speeds help mitigate a bit - when a customer is willing
to upgrade).

The key is I don''t want to burn a huge amount of memory and I want to incur
the minimal amount of lookup time possible. In this case the only way to
handle this is with a language change I believe. Because having the compiler
do the mapping at compile time saves me from burning huge memory and
virtually any lookup time at run time.
Thanks,
Dave
"Bruce Wood" wrote:
如果您的原始枚举类型的值不超过四位数或更多,为什么不直接使用数组,即使它是一个稀疏数组?
它在具有500 MB内存的机器上有什么不同,如果你在100个数组中浪费80个条目?

似乎不值得改变保存如此少量内存的语言(或重要的哈希表)。 (哈希表
可能会消耗尽可能多的开销。)
If your original enum type does not have values up into four digits or
more, why not just use an array directly, even if it''s a sparse array?
What difference does it make, on a machine with 500 MB of memory, if
you waste 80 entries in an array of 100?

It seems hardly worth a change to the language (or a hash table for
that matter) to save such a small amount of memory. (The hash table
would probably consume as much overhead as that anyway.)



只是想知道规模问题想象我有一个对象,

包含100个字段,我进行哈希查找以获取和设置值。现在

想象我需要每秒填充,读取和处理3000-6000这些。

因此填充它是每秒300,000次操作(散列)和一个

等效读取次数。所以我们会说600,000次操作。在一个1.8GHZ的盒子上

对GetHashCode的600,000次调用约为1/4秒,而且本身并不是计算查找然后处理的
。因此,当完成所有操作后,在这种情况下,

哈希本身占处理器时间的25-35%

或更多,具体取决于这些成员的次数需要被访问。如果

这可以使用最小内存进行直接数组查找

,我们可能会将此降低到1%处理器时间左右。由于我们做的不仅仅是填充和读取数据,因此我们可以节省大量的b

计算和比较,因此额外的24-34%的CPU可以帮助我们

相当多。


" WXS"写道:
Just to give an idea of scale of the problem imagine I have an object that
contains 100 fields that I do hash lookups on to get and set the values. Now
imagine I need to populate, read and process 3000-6000 of these per second.
So for populating it is 300,000 operations per second (hashing) and an
equivalent number of reads. So we''ll say 600,000 operations. On a 1.8GHZ box
600,000 calls to GetHashCode is about 1/4 of a second in and of itself not
counting the lookup then and processing. So when all is said and done the
hashing in this case in and of itself accounts for 25-35% of processor time
or more depending on how many times those members need to be accessed. If
this could be done with a straight array lookup using the minimal memory
possible we could probably cut this down to 1% processor time or so. A huge
savings since we do much more than just populate and read the data, we do
calculations and comparisons so having that extra 24-34% of CPU would help us
quite a bit.

"WXS" wrote:
我给出的例子只是一个例子,现实是那些枚举
几乎可以是整数范围内的任何东西,所以稀疏数组真的不是一个
回答,特别是分配整个数组。如果以查找树的形式完成哈希或地图需要查找时间,这在金融行业中我使用数以千计的市场数据消息(每个中包含大量数据),或者
越来越多,它日渐变得越来越不可行(尽管当客户愿意升级时,处理器速度的增加有助于缓解一点)。

关键是我不想燃烧大量的内存,我希望尽可能少的查询时间。在这种情况下,处理这种情况的唯一方法是我相信语言的改变。因为让编译器在编译时进行映射可以使我免于在运行时刻录大量内存和几乎任何查找时间。

谢谢,
Dave

" Bruce Wood"写道:
The example I gave was just that an example and the reality is those enums
can be almost anything in an integer range so sparse array really isn''t an
answer, especially allocating the whole array. And if done as a lookup tree
hash or map requires that lookup time, which in the financial industry I work
in with thousands of market data messages (with lots of data in each), or
more comming in, it is becoming less and less viable by the day (although
increasing processor speeds help mitigate a bit - when a customer is willing
to upgrade).

The key is I don''t want to burn a huge amount of memory and I want to incur
the minimal amount of lookup time possible. In this case the only way to
handle this is with a language change I believe. Because having the compiler
do the mapping at compile time saves me from burning huge memory and
virtually any lookup time at run time.
Thanks,
Dave
"Bruce Wood" wrote:
如果您的原始枚举类型的值不超过四位数或更多,为什么不直接使用数组,即使它是一个稀疏数组?
它在具有500 MB内存的机器上有什么不同,如果你在100个数组中浪费80个条目?

似乎不值得改变保存如此少量内存的语言(或重要的哈希表)。 (哈希表
可能会消耗尽可能多的开销。)
If your original enum type does not have values up into four digits or
more, why not just use an array directly, even if it''s a sparse array?
What difference does it make, on a machine with 500 MB of memory, if
you waste 80 entries in an array of 100?

It seems hardly worth a change to the language (or a hash table for
that matter) to save such a small amount of memory. (The hash table
would probably consume as much overhead as that anyway.)



这篇关于ECMA / C#标准的想法 - 为性能编译时间哈希的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆