如何在最小影响每秒更新的同时存储和推送模拟状态? [英] How to store and push simulation state while minimally affecting updates per second?

查看:39
本文介绍了如何在最小影响每秒更新的同时存储和推送模拟状态?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的应用由两个线程组成:

  1. GUI 线程(使用 Qt)
  2. 模拟线程

我使用两个线程的原因是为了保持 GUI 响应,同时让 Sim 线程尽可能快地旋转.

在我的 GUI 线程中,我以 30-60 的 FPS 渲染 sim 中的实体;但是,我希望我的 sim 卡提前" - 可以这么说 - 并排队等待最终绘制的游戏状态(想想流媒体视频,你有一个缓冲区).

现在对于我渲染的 sim 的每一帧,我都需要相应的模拟状态".所以我的 sim 线程看起来像:

while(1) {模拟.更新();SimState* s = 新的 SimState;模拟.getAgents(s->agents);//存储代理//在这里将其他东西存储到 SimState 中..stateStore.enqueue(s);//stateStore 是一个 QQueueif(/* 达到某个阈值 */)//推送状态存储}

SimState 看起来像:

struct SimState {std::vector<代理>代理商;//这里还有其他东西};

Simulation::getAgents 看起来像:

void Simulation::getAgents(std::vector &a) const{//mAgents 是一个 std::vectorstd::vector<代理>a_tmp(magents);a.swap(a_tmp);}

Agent 本身是有些复杂的类.成员是一堆 intfloat 和两个 std::vector .

在当前的设置下,sim 不能比 GUI 线程正在绘制的速度更快.我已经确认当前的瓶颈是 simulation.getAgents( s->agents ),因为即使我忽略推送,每秒更新速度也很慢.如果我注释掉该行,我会看到每秒更新数有几个数量级的改进.

那么,我应该使用什么样的容器来存储模拟的状态?我知道在 atm 上有很多复制行为,但其中一些是不可避免的.我应该在向量中存储 Agent* 而不是 Agent 吗?

注意:实际上模拟不是在循环中,而是使用 Qt 的 QMetaObject::invokeMethod(this, "doSimUpdate", Qt::QueuedConnection);所以我可以使用信号/插槽在线程之间进行通信;但是,我已经使用 while(1){} 验证了一个更简单的版本,但问题仍然存在.

解决方案

尝试重新使用 SimState 对象(使用某种池机制),而不是每次都分配它们.经过几次模拟循环后,重用的 SimState 对象的向量将增长到所需大小,从而避免重新分配并节省时间.

实现池的一种简单方法是最初将一堆预先分配的 SimState 对象推送到 std::stack.请注意,堆栈比队列更可取,因为您希望获取缓存中更有可能热"的 SimState 对象(最近使用的 SimState 对象将位于堆栈的顶部).您的模拟队列从堆栈中弹出 SimState 对象,并使用计算出的 SimState 填充它们.然后将这些计算出的 SimState 对象推送到生产者/消费者队列中以提供给 GUI 线程.在被 GUI 线程渲染后,它们被推回 SimState 堆栈(即池").在执行所有这些操作时,尽量避免对 SimState 对象进行不必要的复制.在管道"的每个阶段直接使用 SimState 对象.

当然,您必须在 SimState 堆栈和队列中使用正确的同步机制以避免竞争条件.Qt 可能已经有线程安全的堆栈/队列.如果存在大量争用,无锁堆栈/队列可能会加快处理速度(英特尔线程构建块提供了此类无锁队列).看到计算 SimState 需要 1/50 秒的时间,我怀疑争用是否会成为问题.

如果您的 SimState 池耗尽,那么这意味着您的模拟线程太超前"并且可以等待一些 SimState 对象返回到池中.模拟线程应该阻塞(使用条件变量),直到池中的 SimState 对象再次可用.SimState 池的大小对应于可以缓冲的 SimState 量(例如,大约 50 个对象的池可为您提供长达 1 秒的预紧时间).

您还可以尝试运行并行模拟线程以利用多核处理器.) 并通过指针(或 shared_ptr)传递它们.这将消除大量的堆分配和复制.

您可以单独使用这个想法,也可以与上述想法结合使用.这个想法似乎比上面的管道更容易实现,所以你可能想先尝试一下.

<小时>

另一个想法:代替用于运行模拟循环的线程池,您可以将模拟分解为几个阶段并在其自己的线程中执行每个阶段.生产者/消费者队列用于在阶段之间交换 SimState 对象.为使其有效,不同阶段需要具有大致相似的 CPU 工作负载(否则,一个阶段将成为瓶颈).这是一种利用并行性的不同方式.

My app is comprised of two threads:

  1. GUI Thread (using Qt)
  2. Simulation Thread

My reason for using two threads is to keep the GUI responsive, while letting the Sim thread spin as fast as possible.

In my GUI thread I'm rendering the entities in the sim at an FPS of 30-60; however, I want my sim to "crunch ahead" - so to speak - and queue up game state to be drawn eventually (think streaming video, you've got a buffer).

Now for each frame of the sim I render I need the corresponding simulation "State". So my sim thread looks something like:

while(1) {
    simulation.update();
    SimState* s = new SimState;
    simulation.getAgents( s->agents ); // store agents
    // store other things to SimState here..
    stateStore.enqueue(s); // stateStore is a QQueue<SimState*>
    if( /* some threshold reached */ )
        // push stateStore
}

SimState looks like:

struct SimState {
    std::vector<Agent> agents;
    //other stuff here
};

And Simulation::getAgents looks like:

void Simulation::getAgents(std::vector<Agent> &a) const
{
    // mAgents is a std::vector<Agent>
    std::vector<Agent> a_tmp(mAgents);
    a.swap(a_tmp);
}

The Agents themselves are somewhat complex classes. The members are a bunch of ints and floats and two std::vector<float>s.

With this current setup the sim can't crunch must faster than the GUI thread is drawing. I've verified that the current bottleneck is simulation.getAgents( s->agents ), because even if I leave out the push the updates-per-second are slow. If I comment out that line I see several orders of magnitude improvement in updates/second.

So, what sorts of containers should I be using to store the simulation's state? I know there is a bunch of copying going on atm, but some of it is unavoidable. Should I store Agent* in the vector instead of Agent ?

Note: In reality the simulation isn't in a loop, but uses Qt's QMetaObject::invokeMethod(this, "doSimUpdate", Qt::QueuedConnection); so I can use signals/slots to communicate between the threads; however, I've verified a simpler version using while(1){} and the issue persists.

解决方案

Try re-using your SimState objects (using some kind of pool mechanism) instead of allocating them every time. After a few simulation loops, the re-used SimState objects will have vectors that have grown to the needed size, thus avoiding reallocation and saving time.

An easy way to implement a pool is to initially push a bunch of pre-allocated SimState objects onto a std::stack<SimState*>. Note that a stack is preferable to a queue, because you want to take the SimState object that is more likely to be "hot" in the cache memory (the most recently used SimState object will be at the top of the stack). Your simulation queue pops SimState objects off the stack and populates them with the computed SimState. These computed SimState objects are then pushed into a producer/consumer queue to feed the GUI thread. After being rendered by the GUI thread, they are pushed back onto the SimState stack (i.e. the "pool"). Try to avoid needless copying of SimState objects while doing all this. Work directly with the SimState object in each stage of your "pipeline".

Of course, you'll have to use the proper synchronization mechanisms in your SimState stack and queue to avoid race conditions. Qt might already have thread-safe stacks/queues. A lock-free stack/queue might speed things up if there is a lot of contention (Intel Thread Building Blocks provides such lock-free queues). Seeing that it takes on the order of 1/50 seconds to compute a SimState, I doubt that contention will be a problem.

If your SimState pool becomes depleted, then it means that your simulation thread is too "far ahead" and can afford to wait for some SimState objects to be returned to the pool. The simulation thread should block (using a condition variable) until a SimState object becomes available again in the pool. The size of your SimState pool corresponds to how much SimState can be buffered (e.g. a pool of ~50 objects gives you a crunch-ahead time of up to ~1 seconds).

You can also try running parallel simulation threads to take advantage of multi-core processors. The Thread Pool pattern can be useful here. However, care must be taken that the computed SimStates are enqueued in the proper order. A thread-safe priority queue ordered by time-stamp might work here.

Here's a simple diagram of the pipeline architecture I'm suggesting:

(Right-click and select view image for a clearer view.)

(NOTE: The pool and queue hold SimState by pointer, not by value!)

Hope this helps.


If you plan to re-use your SimState objects, then your Simulation::getAgents method will be inefficient. This is because the vector<Agent>& a parameter is likely to already have enough capacity to hold the agent list.

The way you're doing it now would throw away this already allocated vector and create a new one from scratch.

IMO, your getAgents should be:

void Simulation::getAgents(std::vector<Agent> &a) const
{
    a = mAgents;
}

Yes, you lose exception safety, but you might gain performance (especially with the reusable SimState approach).


Another idea: You could try making your Agent objects fixed-size, by using a c-style array (or boost::array) and "count" variable instead std::vector for Agent's float list members. Simply make the fixed-size array big enough for any situation in your simulation. Yes, you'll waste space, but you might gain a lot of speed.

You can then pool your Agents using a fixed-size object allocator (such as boost::pool) and pass them around by pointer (or shared_ptr). That'll eliminate a lot of heap allocation and copying.

You can use this idea alone or in combination with the above ideas. This idea seems easier to implement than the pipeline thing above, so you might want to try it first.


Yet another idea: Instead of a thread pool for running simulation loops, you can break down the simulation into several stages and execute each stage in it's own thread. Producer/consumer queues are used to exchange SimState objects between stages. For this to be effective, the different stages need to have roughly similar CPU workloads (otherwise, one stage will become the bottleneck). This is a different way to exploit parallelism.

这篇关于如何在最小影响每秒更新的同时存储和推送模拟状态?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆