不按比例放大与缩小的原因? [英] Reasons for NOT scaling-up vs. -out?

查看:61
本文介绍了不按比例放大与缩小的原因?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

作为一名程序员,我每隔几年就会有革命性的发现.我要么领先于曲线,要么落后于相位约 π.我学到的一个惨痛教训是,横向扩展并不总是更好,通常最大的性能提升是当我们重新组合和扩展时.

As a programmer I make revolutionary findings every few years. I'm either ahead of the curve, or behind it by about π in the phase. One hard lesson I learned was that scaling OUT is not always better, quite often the biggest performance gains are when we regrouped and scaled up.

您选择横向扩展还是向上扩展的原因是什么?价格、性能、愿景、预计使用量?如果是这样,这对您有何帮助?

What reasons to you have for scaling out vs. up? Price, performance, vision, projected usage? If so, how did this work for you?

我们曾经扩展到数百个节点,这些节点会将必要的数据序列化和缓存到每个节点,并对记录运行数学过程.许多、数十亿条记录需要(交叉)分析.这是采用横向扩展的完美业务和技术案例.我们一直在优化,直到我们在 26 小时挂钟中处理了大约 24 小时的数据.长话短说,我们租用了一个巨大的(当时)IBM pSeries,将 Oracle Enterprise 放在上面,索引我们的数据并最终在大约 6 小时内处理了相同的 24 小时数据.为我带来革命.

We once scaled out to several hundred nodes that would serialize and cache necessary data out to each node and run maths processes on the records. Many, many billions of records needed to be (cross-)analyzed. It was the perfect business and technical case to employ scale-out. We kept optimizing until we processed about 24 hours of data in 26 hours wallclock. Really long story short, we leased a gigantic (for the time) IBM pSeries, put Oracle Enterprise on it, indexed our data and ended up processing the same 24 hours of data in about 6 hours. Revolution for me.

很多企业系统都是 OLTP 并且数据不是分片的,但许多人希望集群或横向扩展.这是对新技术的反应还是感知到的性能?

So many enterprise systems are OLTP and the data are not shard'd, but the desire by many is to cluster or scale-out. Is this a reaction to new techniques or perceived performance?

今天的应用程序或我们的编程矩阵是否更适合横向扩展?我们/应该在未来始终考虑这一趋势吗?

Do applications in general today or our programming matras lend themselves better for scale-out? Do we/should we take this trend always into account in the future?

推荐答案

毫不奇怪,这完全取决于您的问题.如果您可以轻松地将其划分为不进行太多交流的子问题,那么横向扩展可以提供微不足道的加速.例如,在 1B 网页中搜索一个词可以通过一台机器搜索 1B 页面来完成,或者由 1M 台机器每台搜索 1000 页来完成,而不会显着降低效率(因此速度提高了 1,000,000 倍).这就是所谓的尴尬平行".

Not surprisingly, it all depends on your problem. If you can easily partition it with into subproblems that don't communicate much, scaling out gives trivial speedups. For instance, searching for a word in 1B web pages can be done by one machine searching 1B pages, or by 1M machines doing 1000 pages each without a significant loss in efficiency (so with a 1,000,000x speedup). This is called "embarrassingly parallel".

然而,其他算法确实需要在子部分之间进行更密集的通信.您需要交叉分析的示例是一个完美的示例,说明通信通常会淹没添加更多框的性能提升.在这些情况下,您需要在一个(更大的)盒子内保持通信,通过高速互连,而不是像 (10-)Gig-E 那样常见"的东西.

Other algorithms, however, do require much more intensive communication between the subparts. Your example requiring cross-analysis is the perfect example of where communication can often drown out the performance gains of adding more boxes. In these cases, you'll want to keep communication inside a (bigger) box, going over high-speed interconnects, rather than something as 'common' as (10-)Gig-E.

当然,这是一个相当理论化的观点.其他因素,例如 I/O、可靠性、易于编程(一台大型共享内存机器通常比集群少很多麻烦)也有很大的影响.

Of course, this is a fairly theoretical point of view. Other factors, such as I/O, reliability, easy of programming (one big shared-memory machine usually gives a lot less headaches than a cluster) can also have a big influence.

最后,由于使用廉价商品硬件横向扩展的(通常是极端的)成本优势,集群/网格方法最近吸引了更多(算法)研究.这使得新的并行化方法被开发出来,以最大限度地减少通信,从而在集群上做得更好——而众所周知,这些类型的算法只能在大型铁机器上有效运行......

Finally, due to the (often extreme) cost benefits of scaling out using cheap commodity hardware, the cluster/grid approach has recently attracted much more (algorithmic) research. This makes that new ways of parallelization have been developed that minimize communication, and thus do much better on a cluster -- whereas common knowledge used to dictate that these types of algorithms could only run effectively on big iron machines...

这篇关于不按比例放大与缩小的原因?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆