对C ++ VS高频金融虚拟机的语言表现 [英] Performance of C++ vs Virtual Machine languages in high frequency finance

查看:146
本文介绍了对C ++ VS高频金融虚拟机的语言表现的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我认为,C / C ++对C#/ Java性能问题是很好践踏,这意味着我读到足够的证据表明,虚拟机的语言并不一定比接近到芯片语言发布的要慢。这主要是因为JIT编译器可以做优化,静态编译语言不能。

I thought the C/C++ vs C#/Java performance question was well trodden, meaning that I'd read enough evidence to suggest that the VM languages are not necessarily any slower than the "close-to-silicon" languages. Mostly because the JIT compiler can do optimizations that the statically compiled languages cannot.

不过,我最近收到来自一个人谁声称,基于Java的高频交易始终殴打C ++简历,而且他一直在一个情况下是这种情况。

However, I recently received a CV from a guy who claims that Java-based high frequency trading is always beaten by C++, and that he'd been in a situation where this was the case.

一个快速浏览上的招聘网站确实显示,高频交易申请者需要知识​​的C ++,一看威尔莫特论坛,显示了所有的从业者交谈关于C ++。

A quick browse on job sites indeed shows that HFT applicants need knowledge of C++, and a look at Wilmott forum shows all the practitioners talking about C++.

有没有什么特别的原因,为什么是这样?我本来以为与现代金融业务是有些复杂,虚拟机语言类型的安全,管理内存,以及丰富的图书馆将是preferred。生产率较高的方式。此外,JIT编译器变得越来越好。他们可以做的优化,程序运行,所以你会觉得他们是使用运行时信息击败非托管程序的性能。

Is there any particular reason why this is the case? I would have thought that with modern financial business being somewhat complex, a VM language with type safety, managed memory, and a rich library would be preferred. Productivity is higher that way. Plus, JIT compilers are getting better and better. They can do optimizations as the program is running, so you'd think they's use that run-time info to beat the performance of the unmanaged program.

也许这些家伙都写在C ++中的关键位,并从管理的环境(P / Invoke的等)叫他们?这可能吗?

最后,没有任何人有一个核心问题,在此,这也是为什么在这一领域的非托管code是没有pferred通过托管?无疑$ P $体验

据我所知道的,HFT球员需要尽可能快地作出反应传入的市场数据,但是这并不一定是的硬实时要求。你更糟,如果你是缓慢的,这是肯定的,但你并不需要保证一定的速度上的每个响应,你只需要一个快速的平均水平。

As far as I can tell, the HFT guys need to react as fast as possible to incoming market data, but this is not necessarily a hard realtime requirement. You're worse off if you're slow, that's for sure, but you don't need to guarantee a certain speed on each response, you just need a fast average.

修改

右键,一对很好的答案迄今,但pretty的通用(良好的践踏地面)。让我指定HFT人将运行什么样的程序。

Right, a couple of good answers thus far, but pretty general (well-trodden ground). Let me specify what kind of program HFT guys would be running.

的主要标准是响应。当订单进入市场,想要成为第一个能够作出反应。如果你迟到了,别人可能需要在你面前,但每个企业有一个稍微不同的策略,所以你可能会确定,如果一个迭代是一个有点慢。

The main criterion is responsiveness. When an order hits the market, you want to be the first to be able to react to it. If you're late, someone else might take it before you, but each firm has a slightly different strategy, so you might be OK if one iteration is a bit slow.

该程序运行一整天,几乎无需用户干预。无论功能是处理每条新的市场数据,运行数十倍(甚至上百个)第二。

The program runs all day long, with almost no user intervention. Whatever function is handling each new piece of market data is run dozens (even hundreds) of times a second.

这些企业一般都没有限制,如何昂贵的硬件。

These firms generally have no limit as to how expensive the hardware is.

推荐答案

首先,1毫秒是HFT永恒。如果你认为它不是,那么这将是很好做有关域多一点读书。 (这就象是100英里远的交流。)吞吐量和延迟深深地交织在任何基本排队论教科书的公式会告诉你。同样的公式将显示抖动值(通常由CPU队列时延的标准差为主,如果网络结构是正确的,你还没有配置完全够用的核心)。

Firstly, 1 ms is an eternity in HFT. If you think it is not then it would be good to do a bit more reading about the domain. (It is like being 100 miles away from the exchange.) Throughput and latency are deeply intertwined as the formulae in any elementary queuing theory textbook will tell you. The same formulae will show jitter values (frequently dominated by the standard deviation of CPU queue delay if the network fabric is right and you have not configured quite enough cores).

其中一个问题HFT套利是,一旦你决定要拍摄为$ P $垫,还有两条腿(或以上),以实现盈利。如果你不打你可以留下你真的不想要一个位置(和随后的损失)所有的腿 - 毕竟你不进行套利投资。

One of the problems with HFT arbitrage is that once you decide to capture a spread, there are two legs (or more) to realize the profit. If you fail to hit all legs you can be left with a position that you really don't want (and a subsequent loss) - after all you were arbitraging not investing.

您不想要的位置,除非你的战略是predicting(非常近期!!!)未来(而这一点,无论你相信与否,是非常成功的完成)。如果你是1毫秒远离交换那么你的订单有些显著比例将不被执行,你想要的东西会​​被摘了下来。最有可能已开始执行那些一条腿将最终的输家,或者至少不盈利。

You don't want positions unless your strategy is predicting the (VERY near term!!!) future (and this, believe it or not, is done VERY successfully). If you are 1 ms away from exchange then some significant fraction of your orders won't be executed and what you wanted will be picked off. Most likely the ones that have executed one leg will end up losers or at least not profitable.

无论你的战略是为了便于讨论,让我们说,它结束了一个55%/ 45%,输/赢率。即使在输/赢率的微小变化可以在盈利大的变化。

Whatever your strategy is for argument's sake let us say it ends up a 55%/45% win/loss ratio. Even a small change in the win/loss ratio can have in big change in profitability.

回复:跑几十个(甚至上百个)由量级似乎关闭连看20000蜱第二似乎较低,虽然这可能是平均一整天的仪表组他是在看。

re: "run dozens (even hundreds)" seems off by orders of magnitude Even looking at 20000 ticks a second seems low, though this might be the average for the entire day for the instrument set that he is looking at.

有高可变性中所见,在任何给定的第二速率的。我举一个例子。在我的一些测试,看看我7场外交易的股票(CSCO,GOOG,MSFT,EBAY,AAPL,INTC,DELL)在一天中的每秒的速率此流的范围可以从0 MPS(非常非常罕见的),以几乎近2000报价,每峰交易秒。 (知道为什么我觉得上面的20000低。)

There is high variability in the rates seen in any given second. I will give an example. In some of my testing I look at 7 OTC stocks (CSCO,GOOG,MSFT,EBAY,AAPL,INTC,DELL) in the middle of the day the per second rates for this stream can range from 0 mps (very very rare) to almost almost 2000 quotes and trades per peak second. (see why I think the 20000 above is low.)

我建设基础设施和测量软件这个领域,我们谈论的数字是100000的数以百万计每秒。我有C ++生产者/消费者基础架构库,可以把近500万(5万美元)的消息/生产者和消费者之间的第二次(32位,2.4 GHz内核)。以上是关于消费者方面是新,建设,排队,同步,在制片方和同步,出队,触摸每一个字节,运行虚拟析构函数,免费 64字节的信息。现在,不可否认这是一个简单的基准,没有插座IO(插座IO可以丑)为将在终点管段的终点。这是所有定制的同步类,只有同步空当,自定义分配器,自定义锁自由队列和列表,偶尔STL(自定义的分配),但更多的是自定义的侵入集合(其中我有显著库)。不止一次,我已经给在这个舞台上一个供应商提供的四倍(及以上)的产量不增加配料在插座端点。

I build infrastructure and measurement software for this domain and the numbers we talk about are 100000's and millions per second. I have C++ producer/consumer infrastructure libraries that can push almost 5000000 (5 million) messages/second between producer and consumer, (32 bit,2.4 GHz cores). These are 64 byte messages with new, construct, enqueue, synchronize, on the producer side and synchronize,dequeue,touch every byte,run virtual destructor,free on the consumer side. Now admittedly that is a simple benchmark with no Socket IO (and socket IO can be ugly) as would be at the end points of the end point pipe stages. It is ALL custom synchronization classes that only synchronize when empty, custom allocators, custom lock free queues and lists, occasional STL(with custom allocators) but more often custom intrusive collections (of which I have a significant library). More than once I have given a vendor in this arena a quadruple (and more) in throughput without increased batching at the socket endpoints.

我的订单量和手持订单::宇宙类,在不到2us的新的,插入,查找,偏补,发现,当平均超过22000文书第二填充,擦除,删除序列。基准遍历所有的22000仪器串接在插入第一填充和最后填充因此不涉及便宜缓存招数。操作集成到同一本书是由22000不同的书籍的访问分开。这是非常不实际的数据的高速缓存特性。真实数据更本地化的时间和连续交易频频命中同一本书。

I have OrderBook and OrderBook::Universe classes that take less than 2us for new, insert, find, partial fill, find, second fill, erase, delete sequence when averaged over 22000 instruments. The benchmark iterates over all 22000 instruments serially between the insert first fill and last fill so there are no cheap caching tricks involved. Operations into the same book are separated by accesses of 22000 different books. These are very much NOT the caching characteristics of real data. Real data is much more localized in time and consecutive trades frequently hit the same book.

所有这些工作进行仔细考虑使用的任何藏品的算法成本的常量和高速缓存的特点。 (有时似乎在K在K * O(n)的K *为O(n * log n)的等,等,等,被解雇有点太满口)

All of this work involves careful consideration of the constants and caching characteristics in any of the algorithmic costs of the collections used. (Sometimes it seems that the K's in K*O(n) K*O(n*log n) etc., etc., etc. are dismissed a bit too glibly)

我工作的事情Marketdata基础设施方面。这是不可想象的,甚至认为使用Java或对这项工作的托管环境。而当你可以得到这样的用C ++的性能,我认为这是相当难求万元+ / MPS与托管环境中的性能),我无法想象任何显著的投资银行或对冲基金的(对他们来说,一个$ 250000工资一个顶尖的C ++程序员是什么)不打算用C ++。

I work on the Marketdata infrastructure side of things. It is inconceivable to even think of using java or a managed environment for this work. And when you can get this kind of performance with C++ and I think it is quite hard to get million+/mps performance with a managed environment) I can't imagine any of the significant investment banks or hedge funds (for whom a $250000 salary for a top notch C++ programmer is nothing) not going with C++.

时的人在那里真是越来越2000000 + / MPS表现出来的管理环境?我知道有不少人在这个舞台上,没有人会吹嘘它给我。我认为2mm的托管环境会有些吹牛的权利。

Is anybody out there really getting 2000000+/mps performance out of a managed environment? I know a few people in this arena and no one ever bragged about it to me. And I think 2mm in a managed environment would have some bragging rights.

我所知道的一个主要玩家的FIX为了去codeR做1200场去codeS /秒。 (3GHz的CPU),这是C ++和谁写的差不多挑战任何人拿出东西的家伙 在托管环境中,甚至是一半的速度。

I know of one major player's FIX order decoder doing 12000000 field decodes/sec. (3Ghz CPU) It is C++ and the guy who wrote it almost challenged anybody to come up with something in a managed environment that is even half that speed.

从技术上它有很多有趣的性能挑战,一个有趣的领域。考虑期权市场时,标的证券的变化 - 有可能是说6出色的价位有3或4个不同到期日期。现在对于每笔交易大概有10-20个引号。这些报价可能触发该选项价格的变化。 因此,对于每个行业可能有期权报价100或200的变化。这仅仅是一个大量的数据 - 而不是大型强子对撞机的碰撞检测般的数据量,但仍然是一个小小的挑战。它比处理的按键有点不同。

Technologically it is an interesting area with lots of fun performance challenges. Consider the options market when the underlying security changes - there might be say 6 outstanding price points with 3 or 4 different expiration dates. Now for each trade there were probably 10-20 quotes. Those quotes can trigger price changes in the options. So for each trade there might be 100 or 200 changes in options quotes. It is just a ton of data - not a Large Hadron Collider collision-detector-like amount of data but still a bit of a challenge. It is a bit different than dealing with keystrokes.

甚至对FPGA的那张辩论。很多人认为这对3GHZ商品硬件上运行的良好codeD分析器可以击败500MHz的FPGA的位置。但是,即使一点点慢(不是说他们是)基于FPGA的系统可往往有严格的时延分布。 (阅读倾向 - 这不是一个毯子声明),当然,如果您有推动通过一个Cfront的,然后推送通过FPGA图像生成一个伟大的C ++解析器。但另一个争论......

Even the debate about FPGA's goes on. Many people take the position that a well coded parser running on 3GHZ commodity HW can beat a 500MHz FPGA. But even if a tiny bit slower (not saying they are) FPGA based systems can tend to have tighter delay distributions. (Read "tend" - this is not a blanket statement) Of course if you have a great C++ parser that you push through a Cfront and then push that through the FPGA image generator... But that another debate...

这篇关于对C ++ VS高频金融虚拟机的语言表现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆