什么是解析.NET FIX协议消息的最有效方法是什么? [英] What's the most efficient way to parse FIX Protocol messages in .NET?

查看:286
本文介绍了什么是解析.NET FIX协议消息的最有效方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我碰到此非常类似的问题但问题是标记的quickfix(这是不相关的我的问题)和大部分的答案是QuickFix的相关

我的问题是广泛的。我正在寻找在解析一个 FIX协议信息最有效的方法> 。作为背景,一个FIX消息由一系列由ASCII分隔标记/值对< SOH> 字符(0x01)的。场的消息的数量是可变的。

这是示例消息可能是这样的:

<$p$p><$c$c>8=FIX.4.2<SOH>9=175<SOH>35=D<SOH>49=BUY1<SOH>56=SELL1<SOH>34=2482<SOH>50=frg<SOH> 52 = 20100702-11:12:42℃; SOH&GT; 11 = BS01000354924000&其中; SOH&GT; 21 = 3'; SOH→100 = J&其中; SOH&GT; 55 = ILA SJ&其中; SOH&GT; 48=YY77<SOH>22=5<SOH>167=CS<SOH>207=J<SOH>54=1<SOH>60=20100702-11:12:42<SOH> 38 = 500℃; SOH→40 = 1&其中; SOH→15 = ZAR&其中; SOH&GT; 59 = 0℃SOH→10 = 230℃; SOH&GT;

有关每个字段,标签(一个整数)和值(对于我们的目的,一个字符串)由=字符分隔。 (每个标记的precise语义在协议中定义,但不特别有密切关系这个问题。)

这是经常这样做基本分析的时候,你只关心少数特定的标签从FIX头,并没有真正做随机访问的每一个可能的情况下。我的策略已经考虑的因素包括:

  • 使用 String.Split ,每一个元素遍历并把标签索引映射在一个Hashtable - 如果需要,在提供完整的随机访问到所有领域某些时候

  • (轻微优化)使用 String.Split ,扫描阵列感兴趣的标签,并把标签索引映射到另一个容器(不一定是一个Hashtable因为它可能是一个相当小的数目的项目,和项目数之前解析)是已知的

  • 通过现场使用扫描消息字段 String.IndexOf 和存储偏移和兴趣领域的适当的结构长度

对于前两个 - 虽然我的测量表明 String.Split 是pretty的快,因为每次的 的方法的文档的分配所产生的阵列,它可以产生大量的垃圾,如果你解析的每个元素一个新的String许多消息。任何人都可以看到一个更好的方式来在.NET解决这个问题?

编辑:

三个重要的信息片段,我离开了:

  1. 标签不一定FIX消息中是唯一的,即重复标记可能会出现在某些情况下。

  2. 某些类型FIX栏位可以包含嵌入&LT; SOH&GT; 中的数据 - 这些标签被称为是类型数据的 - 词典列出了这种类型的标签号。

  3. 最终的要求是要能够编辑信息(特别是替代值)。

解决方案

的假设是,你得到这些消息要么通过有线或者你从磁盘加载它们。在这两种情况下,你可以访问这些作为一个字节数组,读取向前读的方式字节数组。如果你想要想/需要/需要那么高的性能解析字节数组自己(的高性能不使用的标记和值哈希表的字典,因为这是极其缓慢通过比较)。解析字节数组自己也意味着,你可以尽量避免使用你的数据不感兴趣,你可以优化分析,以反映这一点。

您应该能够避免大多数对象分配容易。你可以分析FIX浮动数据类型到双打很容易和速度非常快,而无需创建对象(可以超越double.parse大规模使用您自己的版本在这里)。唯一你可能需要考虑了一下说是字符串,例如更多的是标签值在FIX符号值。为了避免在这里创建的字符串,可以用确定每个每个符号一个唯一的标识符,INT(这是一个价值型)的一个简单的方法上来,这将再次帮助您避免在堆上分配。

正确完成的消息订制优化的分析应该很容易跑赢大市的QuickFix,你能做到这一切,在.NET或Java中没有垃圾收集。

I came across this very similar question but that question is tagged QuickFIX (which is not relevant to my question) and most of the answers are QuickFIX-related.

My question is broader. I'm looking for the most efficient way to parse a FIX Protocol message using C#. By way of background, a FIX message consists of a series of tag/value pairs separated by the ASCII <SOH> character (0x01). The number of fields in a message is variable.

An example message might look like this:

8=FIX.4.2<SOH>9=175<SOH>35=D<SOH>49=BUY1<SOH>56=SELL1<SOH>34=2482<SOH>50=frg<SOH>
52=20100702-11:12:42<SOH>11=BS01000354924000<SOH>21=3<SOH>100=J<SOH>55=ILA SJ<SOH>
48=YY77<SOH>22=5<SOH>167=CS<SOH>207=J<SOH>54=1<SOH>60=20100702-11:12:42<SOH>
38=500<SOH>40=1<SOH>15=ZAR<SOH>59=0<SOH>10=230<SOH>

For each field, the tag (an integer) and the value (for our purposes, a string) are separated by the '=' character. (The precise semantics of each tag are defined in the protocol, but that isn't particularly germane to this question.)

It's often the case that when doing basic parsing, you are only interested in a handful of specific tags from the FIX header, and not really doing random access to every possible field. Strategies I have considered include:

  • Using String.Split, iterating over every element and putting the tag to index mapping in a Hashtable - provides full random-access to all fields if needed at some point

  • (Slight optimisation) Using String.Split, scanning the array for tags of interest and putting the tag to index mapping into another container (not necessarily a Hashtable as it may be a fairly small number of items, and the number of items is known prior to parsing)

  • Scanning the message field by field using String.IndexOf and storing the offset and length of fields of interest in an appropriate structure

Regarding the first two - although my measurements indicate String.Split is pretty fast, as per the documentation the method allocates a new String for each element of the resultant array which can generate a lot of garbage if you're parsing a lot of messages. Can anyone see a better way to tackle this problem in .NET?

EDIT:

Three vital pieces of information I left out:

  1. Tags are not necessarily unique within FIX messages, i.e., duplicate tags can occur under certain circumstances.

  2. Certain types of FIX fields can contain 'embedded <SOH>' in the data - these tags are referred to as being of type 'data' - a dictionary lists the tag numbers that are of this type.

  3. The eventual requirement is to be able to edit the message (particularly replace values).

解决方案

The assumption is that you are getting these messages either over the wire or you are loading them from disk. In either case, you can access these as a byte array and read the byte array in a forward read manner. If you want want/need/require high performance then parse the byte array yourself (for high performance don't use a dictionary of hashtable of tags and values as this is extremely slow by comparison). Parsing the byte array yourself also means that you can avoid using data you are not interested in and you can optimise the parsing to reflect this.

You should be able to avoid most object allocation easily. You can parse FIX float datatypes to doubles quite easily and very quickly without creating objects (you can outperform double.parse massively with your own version here). The only ones you might need to think about a bit more are tag values that are strings e.g. symbol values in FIX. To avoid creating strings here, you could come up with a simple method of determining a unique int identifier for each each symbol (which is a value type) and this will again help you avoid allocation on the heap.

Customised optimised parsing of the message done properly should easily outperform QuickFix and you can do it all with no garbage collection in .NET or Java.

这篇关于什么是解析.NET FIX协议消息的最有效方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆