C ++ 11异步仅使用一个核心 [英] C++11 async is using only one core

查看:191
本文介绍了C ++ 11异步仅使用一个核心的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想用C parallelise长时间运行的功能++和使用的std ::异步它仅使用一个核心。

这不是功能的运行时间是太小了,因为我目前使用大约需要10分钟跑的测试数据。

从我的逻辑,我创建确定nthreads价值期货(每次取环,而不是单个细胞的比例,因此是一个很好的长期运行的线程),每个将派遣一个异步任务。他们已经创建的程序自旋锁等待再经过他们来完成。然而,它总是使用一个核心?!

这是不是我在找顶部和任说它看起来大致是一个CPU,我ZSH配置输出的最后一个命令的CPU%,它总是究竟的100%,不会超过

 自动确定nthreads = 12;
自动块大小=(INT)的std :: CEIL((INT)(确定nthreads / PathCountLength));的std ::矢量<的std ::未来<的std ::矢量<无符号__int128>>>期货;为(自动I = 0; I<确定nthreads ++ I){
    性病::法院LT&;< 这里<<的std :: ENDL;
    无符号__int128敏= I *块大小;
    无符号__int128最大= I *块大小+块大小;    如果(我==来确定nthreads - 1)
        最大= PathCountLength;    Futures.push_back(性病::异步(
        [](无符号__int128 WMin,无符号__int128敏,unsigned__int128最大,
           的std ::矢量<无符号__int128> ZeroChildren,
           的std ::矢量<无符号__int128> OneChildren,
           无符号__int128 PathCountLength)
            - >的std ::矢量<无符号__int128> {
           的std ::矢量<无符号__int128> LocalCount;
           对(无符号__int128 I =敏; I<最大++ I)
               LocalCount.push_back(KneeParallel :: pathCountOrStatic(
                   WMin,我,ZeroChildren,OneChildren,PathCountLength));
          返回LocalCount;
    },
    WMin,最小值,最大值,ZeroChildInit,OneChildInit,PathCountLength));
}为(自动&放大器;未来:期货){
    的Future.get();
}

没有人有任何见解。

我与铛和LLVM在Arch Linux的编译。是否有任何编译标志我需要的,但我可以告诉C ++ 11标准的线程库?

编辑:如果它可以帮助任何人给予任何进一步的线索,当我注释掉它所有内核上运行,因为它应该在本地矢量,当我在卷砸背回一个核心

编辑2:所以我牵制了解决方案,但它似乎很离奇。返回从它固定到一个核心lambda函数向量,所以现在我避开这个问题通过在的shared_ptr 传递到输出向量和操纵的。哎preSTO,它激发了对核心!

我想这是毫无意义的,现在使用期货作为我没有回报,我会使用线程而不是,没了!使用线程是没有的回报也使用一个核心。奇怪吧?

好吧,回去使用期货,刚刚返回到扔掉或东西。没错,你猜对了,即使返回从线程程序坚持一个核心一个int。除了期货不能有空隙lambda函数。所以,我的的是在传递一个指针来保存输出,从来没有任何返回一个int lambda表达式。是啊,感觉就像胶带,但我不能看到一个更好的解决方案。

如此看来...的bizzare?像编译器是莫名其妙间$ P $错误pting拉姆达。难道是因为我使用LLVM的开发版,而不是一个稳定的分支...?

反正我的解决方案,因为我恨没有什么比在这里找我problm和没有回答更多:

 自动确定nthreads = 4;
自动块大小=(INT)的std :: CEIL((INT)(确定nthreads / PathCountLength));自动期货=的std ::矢量<的std ::未来< INT>>(确定nthreads);
汽车OutputVectors =
    的std ::矢量<的std :: shared_ptr的<的std ::矢量<无符号__int128>>>(
        确定nthreads,性病:: make_shared<的std ::矢量<无符号__int128>>());为(自动I = 0; I<确定nthreads ++ I){
  无符号__int128敏= I *块大小;
  无符号__int128最大= I *块大小+块大小;如果(我==来确定nthreads - 1)
  最大= PathCountLength;期货[I] =的std ::异步(
  的std ::推出::异步,
  [](无符号__int128 WMin,无符号__int128民,无符号__int128最大,
       的std ::矢量<无符号__int128> ZeroChildren,
       的std ::矢量<无符号__int128> OneChildren,
       无符号__int128 PathCountLength,
       的std :: shared_ptr的<的std ::矢量<无符号__int128>> OutputVector)
         - > INT {
      对(无符号__int128 I =敏; I<最大++ I){
        OutputVector->的push_back(KneeParallel :: pathCountOrStatic(
            WMin,我,ZeroChildren,OneChildren,PathCountLength));
      }
    },
    WMin,最小值,最大值,ZeroChildInit,OneChildInit,PathCountLength,
    OutputVectors [Ⅰ]);
}为(自动&放大器;未来:期货){
  的Future.get();
}


解决方案

由于提供了第一个参数异步,您可以将其配置为运行延迟(的std ::推出::递延),以在自己的线程(的std ::推出::运行异步),或者让系统这两个选项之间做出选择( STD ::推出::异步|性病::推出::递延)。后者是默认的行为。

所以,迫使它在另一个线程中运行,适应你的的std ::异步的调用的std ::异步(性病::推出::异步,/*...*/)

I'm trying to parallelise a long running function in C++ and using std::async it only uses one core.

It's not the running time of the function is too small, as I'm currently using test data that takes about 10 mins to run.

From my logic I create NThreads worth of Futures (each taking a proportion of the loop rather than an individual cell so it is a nicely long running thread), each of which will dispatch an async task. Then after they've been created the program spin locks waiting for them to complete. However it always uses one core?!

This isn't me looking at top either and saying it looks roughly like one CPU, my ZSH config outputs the CPU % of the last command, and it always exactly 100%, never above

auto NThreads = 12;
auto BlockSize = (int)std::ceil((int)(NThreads / PathCountLength));

std::vector<std::future<std::vector<unsigned __int128>>> Futures;

for (auto I = 0; I < NThreads; ++I) {
    std::cout << "HERE" << std::endl;
    unsigned __int128 Min = I * BlockSize;
    unsigned __int128 Max = I * BlockSize + BlockSize;

    if (I == NThreads - 1)
        Max = PathCountLength;

    Futures.push_back(std::async(
        [](unsigned __int128 WMin, unsigned __int128 Min, unsigned__int128 Max,
           std::vector<unsigned __int128> ZeroChildren,
           std::vector<unsigned __int128> OneChildren,
           unsigned __int128 PathCountLength)
           -> std::vector<unsigned __int128> {
           std::vector<unsigned __int128> LocalCount;
           for (unsigned __int128 I = Min; I < Max; ++I)
               LocalCount.push_back(KneeParallel::pathCountOrStatic(
                   WMin, I, ZeroChildren, OneChildren, PathCountLength));
          return LocalCount;
    },
    WMin, Min, Max, ZeroChildInit, OneChildInit, PathCountLength));
}

for (auto &Future : Futures) {
    Future.get();
}

Does anyone have any insight.

I'm compiling with clang and LLVM on Arch Linux. Are there any compile flags I need, but from what I can tell C++11 standardised the thread library?

Edit: If it helps anyone giving any further clues, when I comment out the local vector it runs on all cores as it should, when I drop it back in rolls back to one core.

Edit 2: So I pinned down the solution, but it seems very bizarre. Returning the vector from the lambda function fixed it to one core, so now I get round this by passing in a shared_ptr to the output vector and manipulating that. And hey presto, it fires up on the cores!

I figured it was pointless now using futures as I don't have a return and I'd use threads instead, nope!, using threads with no returns also uses one core. Weird eh?

Fine, go back to using futures, just return an into to throw away or something. Yep you guessed it, even returning an int from the thread sticks the program to one core. Except futures can't have void lambda functions. So my solution is to pass a pointer in to store the output, to an int lambda function that never returns anything. Yeah it feels like duct tape, but I can't see a better solution.

It seems so...bizzare? Like the compiler is somehow interpreting the lambda incorrectly. Could it be because I use the dev release of LLVM and not a stable branch...?

Anyway my solution, because I hate nothing more than finding my problm on here and having no answer:

auto NThreads = 4;
auto BlockSize = (int)std::ceil((int)(NThreads / PathCountLength));

auto Futures = std::vector<std::future<int>>(NThreads);
auto OutputVectors =
    std::vector<std::shared_ptr<std::vector<unsigned __int128>>>(
        NThreads, std::make_shared<std::vector<unsigned __int128>>());

for (auto I = 0; I < NThreads; ++I) {
  unsigned __int128 Min = I * BlockSize;
  unsigned __int128 Max = I * BlockSize + BlockSize;

if (I == NThreads - 1)
  Max = PathCountLength;

Futures[I] = std::async(
  std::launch::async,
  [](unsigned __int128 WMin, unsigned __int128 Min, unsigned __int128 Max,
       std::vector<unsigned __int128> ZeroChildren,
       std::vector<unsigned __int128> OneChildren,
       unsigned __int128 PathCountLength,
       std::shared_ptr<std::vector<unsigned __int128>> OutputVector)
        -> int {
      for (unsigned __int128 I = Min; I < Max; ++I) {
        OutputVector->push_back(KneeParallel::pathCountOrStatic(
            WMin, I, ZeroChildren, OneChildren, PathCountLength));
      }
    },
    WMin, Min, Max, ZeroChildInit, OneChildInit, PathCountLength,
    OutputVectors[I]);
}

for (auto &Future : Futures) {
  Future.get();
}

解决方案

By providing a first argument to async, you can configure it to run deferred (std::launch::deferred), to run in its own thread (std::launch::async), or let the system decide between both options (std::launch::async | std::launch::deferred). The latter is the default behavior.

So, to force it to run in another thread, adapt your call of std::async to std::async(std::launch::async, /*...*/).

这篇关于C ++ 11异步仅使用一个核心的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆