Windows可能出现std :: async实现错误 [英] possible std::async implementation bug Windows

查看:73
本文介绍了Windows可能出现std :: async实现错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

似乎Windows的std :: async实现中存在错误。在高负载(每秒启动1000个线程的数量)上,异步任务永远不会被调度,等待返回的期货会导致死锁。看到这段代码(用启动策略延迟而不是异步进行修改):

It seems like there is a bug in the windows implementation of std::async. Under heavy load (on the order of 1000 threads launched async per second), async tasks are never scheduled and waiting on the returned futures leads to deadlocks. See this piece of code (modified with launch policy deferred instead of async):

BundlingChunk(size_t numberOfInputs, Bundler* parent, ChunkIdType chunkId)
        : m_numberOfInputs(numberOfInputs), m_parent(parent), m_chunkId(chunkId)
    {
        const BundlerChunkDescription& chunk = m_parent->m_chunks[m_chunkId];
        const ChunkInfo& original = chunk.m_original;
        auto& deserializers = m_parent->m_deserializers;

        // Fetch all chunks in parallel.
        std::vector<std::map<ChunkIdType, std::shared_future<ChunkPtr>>> chunks;
        chunks.resize(chunk.m_secondaryChunks.size());
        static std::atomic<unsigned long long int> chunksInProgress = 0;

        for (size_t i = 0; i < chunk.m_secondaryChunks.size(); ++i)
        {
            for (const auto& c : chunk.m_secondaryChunks[i])
            {
                const auto chunkCreationLambda = ([this, c, i] {
                    chunksInProgress++;
                    ChunkPtr chunk = m_parent->m_weakChunkTable[i][c].lock();
                    if (chunk) {
                        chunksInProgress--;
                        return chunk;
                    }
                    chunksInProgress--;
                    return m_parent->m_deserializers[i]->GetChunk(c);
                });
                std::future<ChunkPtr> chunkCreateFuture = std::async(std::launch::deferred, chunkCreationLambda);
                chunks[i].emplace(c, chunkCreateFuture.share());
            }
        }

        std::vector<SequenceInfo> sequences;
        sequences.reserve(original.m_numberOfSequences);

        // Creating chunk mapping.
        m_parent->m_primaryDeserializer->SequenceInfosForChunk(original.m_id, sequences);
        ChunkPtr drivingChunk = chunks.front().find(original.m_id)->second.get();
        m_sequenceToSequence.resize(deserializers.size() * sequences.size());
        m_innerChunks.resize(deserializers.size() * sequences.size());
        for (size_t sequenceIndex = 0; sequenceIndex < sequences.size(); ++sequenceIndex)
        {
            if (chunk.m_invalid.find(sequenceIndex) != chunk.m_invalid.end())
            {
                continue;
            }

            size_t currentIndex = sequenceIndex * deserializers.size();
            m_sequenceToSequence[currentIndex] = sequences[sequenceIndex].m_indexInChunk;
            m_innerChunks[currentIndex] = drivingChunk;
        }

        // Creating sequence mapping and requiring underlying chunks.
        SequenceInfo s;
        for (size_t deserializerIndex = 1; deserializerIndex < deserializers.size(); ++deserializerIndex)
        {
            auto& chunkTable = m_parent->m_weakChunkTable[deserializerIndex];
            for (size_t sequenceIndex = 0; sequenceIndex < sequences.size(); ++sequenceIndex)
            {
                if (chunk.m_invalid.find(sequenceIndex) != chunk.m_invalid.end())
                {
                    continue;
                }

                size_t currentIndex = sequenceIndex * deserializers.size() + deserializerIndex;
                bool exists = deserializers[deserializerIndex]->GetSequenceInfo(sequences[sequenceIndex], s);
                if (!exists)
                {
                    if(m_parent->m_verbosity >= (int)TraceLevel::Warning)
                        fprintf(stderr, "Warning: sequence '%s' could not be found in the deserializer responsible for stream '%ls'\n",
                            m_parent->m_corpus->IdToKey(sequences[sequenceIndex].m_key.m_sequence).c_str(),
                            deserializers[deserializerIndex]->StreamInfos().front().m_name.c_str());
                    m_sequenceToSequence[currentIndex] = SIZE_MAX;
                    continue;
                }

                m_sequenceToSequence[currentIndex] = s.m_indexInChunk;
                ChunkPtr secondaryChunk = chunkTable[s.m_chunkId].lock();
                if (!secondaryChunk)
                {
                    secondaryChunk = chunks[deserializerIndex].find(s.m_chunkId)->second.get();
                    chunkTable[s.m_chunkId] = secondaryChunk;
                }

                m_innerChunks[currentIndex] = secondaryChunk;
            }
        }
    }

我的上述版本已修改,因此异步任务是按延迟而不是异步方式启动的,从而解决了此问题。截至VS2017可再发行的14.12.25810,还有其他人看到过这样的东西吗?重现此问题就像训练CNTK模型一样容易,该模型在带有GPU和SSD的计算机上使用文本和图像读取器,从而使CPU反序列化成为瓶颈。经过约30分钟的训练后,通常会发生死锁。有没有人在Linux上看到过类似的问题?如果是这样,那可能是代码中的错误,尽管我对此表示怀疑,因为调试计数器 chunksInProgress 在死锁后始终为0。作为参考,整个源文件位于 https: //github.com/Microsoft/CNTK/blob/455aef80eeff675c0f85c6e34a03cb73a4693bff/Source/Readers/ReaderLib/Bundler.cpp

My version above is modified so that the async tasks are launched as deferred instead of async, which fixes the issue. Has anyone else seen something like this as of VS2017 redistributable 14.12.25810? Reproducing this issue is as easy as training CNTK model that uses the text and image readers on a machine with a GPU and SSD so that the CPU deserialization becomes the bottleneck. After about 30 minutes of training, a deadlock usually occurs. Has anyone seen a similar issue on Linux? If so, it could be a bug in the code, although I doubt it because the debug counter chunksInProgress is always 0 after deadlock. For reference, the entire source file is located at https://github.com/Microsoft/CNTK/blob/455aef80eeff675c0f85c6e34a03cb73a4693bff/Source/Readers/ReaderLib/Bundler.cpp.

推荐答案

新的一天,更好的答案(很多更好)。继续读下去。

New day, better answer (much better). Read on.

我花了一些时间调查Windows上 std :: async 的行为,你是对的。这是另一种动物,请参见此处

I spent some time investigating the behaviour of std::async on Windows and you're right. It's a different animal, see here.

因此,如果您的代码依赖于 std :: async 始终启动新的执行线程并立即返回,则无法使用它。无论如何,不​​在Windows上。在我的计算机上,该限制似乎是768个后台线程,该线程或多或少适合您所观察到的内容。

So, if your code relies on std::async always starting a new thread of execution and returning immediately then you can't use it. Not on Windows, anyway. On my machine, the limit seems to be 768 background threads, which would fit in, more or less, with what you have observed.

无论如何,我想学习一个关于现代C ++的更多信息,所以我在尝试开发自己的 std :: async 替代品时遇到了麻烦,该替代品可以在Windows上以OP贬低的语义使用。因此,我谦虚地提出以下内容:

Anyway, I wanted to learn a bit more about modern C++ so I had a crack at rolling my own replacement for std::async that can be used on Windows with the semantics deaired by the OP. I therefore humbly present the following:

AsyncTask:替代 std :: async

#include <future>
#include <thread>

template <class Func, class... Args>
    std::future <std::result_of_t <std::decay_t <Func> (std::decay_t <Args>...)>>
        AsyncTask (Func&& f, Args&&... args)
{
    using decay_func = std::decay_t <Func>;
    using return_type = std::result_of_t <decay_func (std::decay_t <Args>...)>;

    std::packaged_task <return_type (decay_func f, std::decay_t <Args>... args)>
        task ([] (decay_func f, std::decay_t <Args>... args)
    {
        return f (args...);
    });

    auto task_future = task.get_future ();
    std::thread t (std::move (task), f, std::forward <Args> (args)...);
    t.detach ();
    return task_future;
};

测试程序

#include <iostream>
#include <string>

int add_two_integers (int a, int b)
{
    return a + b;
}

std::string append_to_string (const std::string& s)
{
    return s + " addendum";
}

int main ()
{
    auto /* i.e. std::future <int> */ f1 = AsyncTask (add_two_integers , 1, 2);
    auto /* i.e. int */  i = f1.get ();
    std::cout << "add_two_integers : " << i << std::endl;

    auto  /* i.e. std::future <std::string> */ f2 = AsyncTask (append_to_string , "Hello world");
    auto /* i.e. std::string */ s = f2.get ();        std::cout << "append_to_string : " << s << std::endl;
    return 0;  
}

输出

add_two_integers : 3
append_to_string : Hello world addendum

实时演示此处(gcc)和此处(c)。

Live demo here (gcc) and here (clang).

我从编写这篇文章中学到了很多,多好玩。我对这些东西还很陌生,因此欢迎所有评论。如果有任何问题,我会很乐意更新。

I learnt a lot from writing this and it was a lot of fun. I'm fairly new to this stuff, so all comments welcome. I'll be happy to update this post if I've got anything wrong.

这篇关于Windows可能出现std :: async实现错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆