什么是面向数据的设计? [英] What is data oriented design?

查看:471
本文介绍了什么是面向数据的设计?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在阅读这篇文章,然后这个家伙继续谈论每个人如何能够从中受益从面向数据的设计与OOP混合.但是,他没有显示任何代码示例.

I was reading this article, and this guy goes on talking about how everyone can greatly benefit from mixing in data oriented design with OOP. He doesn't show any code samples, however.

我在Google上搜索了此信息,却找不到任何真正的信息,更不用说任何代码示例了.是否有人熟悉此术语并可以提供示例?这可能是别的词吗?

I googled this and couldn't find any real information as to what this is, let alone any code samples. Is anyone familiar with this term and can provide an example? Is this maybe a different word for something else?

推荐答案

首先,不要将其与数据驱动设计相混淆.

First of all, don't confuse this with data driven design.

我对面向数据设计的理解是关于组织数据以进行有效处理.尤其是在缓存未命中方面.另一方面,数据驱动设计是关于让数据控制许多程序行为(

My understanding of Data Oriented Design is that it is about organizing your data for efficient processing. Especially with respect to cache misses etc. Data Driven Design on the other hand is about letting data control a lot of your programs behavior (described very well by Andrew Keith's answer).

假设您的应用程序中具有球形对象,这些对象具有颜色,半径,反射度,位置等属性.

Say you have ball objects in your application with properties such as color, radius, bounciness, position etc.

面向对象的方法

在OOP中,您将这样描述球:

In OOP you would describe balls like this:

class Ball {
  Point  position;
  Color  color;
  double radius;

  void draw();
};

然后您将创建一个这样的球集合:

And then you would create a collection of balls like this:

vector<Ball> balls;

面向数据的方法

但是,在面向数据的设计中,您更有可能编写如下代码:

In Data Oriented Design, however, you are more likely to write the code like this:

class Balls {
  vector<Point>  position;
  vector<Color>  color;
  vector<double> radius;

  void draw();
};

如您所见,不再有单个单位代表一个球.球对象仅隐式存在.

As you can see there is no single unit representing one Ball anymore. Ball objects only exist implicitly.

在性能方面,它可以具有许多优点.通常,我们希望同时对多个球进行操作.硬件通常希望连续的大块内存有效运行.

This can have many advantages, performance wise. Usually we want to do operations on many balls at the same time. Hardware usually wants large continuous chunks of memory to operate efficiently.

第二,您可能会执行仅影响部分球属性的操作.例如.如果您以各种方式组合所有球的颜色,则您希望缓存仅包含颜色信息.但是,当所有球属性都存储在一个单元中时,您还将拉入球的所有其他属性.即使您不需要它们.

Secondly you might do operations that affects only part of a balls properties. E.g. if you combine the colors of all the balls in various ways, then you want your cache to only contain color information. However when all ball properties are stored in one unit you will pull in all the other properties of a ball as well. Even though you don't need them.

缓存使用示例

假设每个球占用64个字节,一个点占用4个字节.高速缓存插槽也占用64个字节.如果我想更新10个球的位置,则必须将10 * 64 = 640字节的内存放入高速缓存中,并获得10次高速缓存未命中.但是,如果我可以将球的位置作为单独的单元工作,那将只占用4 * 10 = 40字节.这适合一次缓存提取.因此,我们只有1个缓存未命中来更新所有10个球.这些数字是任意的-我认为缓存块更大.

Say each ball takes up 64 bytes and a Point takes 4 bytes. A cache slot takes, say, 64 bytes as well. If I want to update the position of 10 balls, I have to pull in 10*64 = 640 bytes of memory into cache and get 10 cache misses. If however I can work the positions of the balls as separate units, that will only take 4*10 = 40 bytes. That fits in one cache fetch. Thus we only get 1 cache miss to update all the 10 balls. These numbers are arbitrary - I assume a cache block is bigger.

但是它说明了内存布局如何严重影响缓存命中率并进而影响性能.随着CPU和RAM速度差异的扩大,这只会变得越来越重要.

But it illustrates how memory layout can have severe effect on cache hits and thus performance. This will only increase in importance as the difference between CPU and RAM speed widens.

如何布置内存

在我的舞会示例中,我简化了很多问题,因为通常对于任何普通应用程序,您可能会一起访问多个变量.例如.位置和半径可能会经常一起使用.那么您的结构应为:

In my ball example I simplified the issue a lot, because usually for any normal app you will likely access multiple variables together. E.g. position and radius will probably be used together frequently. Then your structure should be:

class Body {
  Point  position;
  double radius;
};

class Balls {
  vector<Body>  bodies;
  vector<Color>  color;

  void draw();
};

您应该这样做的原因是,如果一起使用的数据放置在单独的阵列中,则存在争夺缓存中相同插槽的风险.因此,加载一个将丢弃另一个.

The reason you should do this is that if data used together are placed in separate arrays, there is a risk that they will compete for the same slots in the cache. Thus loading one will throw out the other.

因此,与面向对象的编程相比,您最终制作的类与问题的心理模型中的实体无关.由于数据是根据数据使用情况汇总在一起的,因此在面向数据的设计中,您不会总是有一个明智的名称来给您的类命名.

So compared to Object Oriented programming, the classes you end up making are not related to the entities in your mental model of the problem. Since data is lumped together based on data usage, you won't always have sensible names to give your classes in Data Oriented Design.

与关系数据库的关系

面向数据的设计背后的想法与您对关系数据库的想法非常相似.优化关系数据库还可以更有效地使用高速缓存,尽管在这种情况下,高速缓存不是CPU高速缓存,而是内存中的页面.优秀的数据库设计人员也可能会将不经常访问的数据拆分到一个单独的表中,而不是创建一个包含大量列的表(如果只使用了少数列).他可能还会选择对某些表进行非规范化,以便不必从磁盘上的多个位置访问数据.就像面向数据的设计一样,这些选择是通过查看数据访问模式是什么以及性能瓶颈在哪里进行的.

The thinking behind Data Oriented Design is very similar to how you think about relational databases. Optimizing a relational database can also involve using the cache more efficiently, although in this case, the cache is not CPU cache but pages in memory. A good database designer will also likely split out infrequently accessed data into a separate table rather than creating a table with huge number of columns were only a few of the columns are ever used. He might also choose to denormalize some of the tables so that data don't have to be accessed from multiple locations on disk. Just like with Data Oriented Design these choices are made by looking at what the data access patterns are and where the performance bottleneck is.

这篇关于什么是面向数据的设计?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆