为什么dispatch_sync在自定义并发队列死锁上 [英] Why is dispatch_sync on custom concurrent queue deadlocking

查看:97
本文介绍了为什么dispatch_sync在自定义并发队列死锁上的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在自定义并发dispatch_queue上使用dispatch_sync时,我的应用程序中出现间歇性死锁。我正在使用与 Mike Ash的博客支持并发读访问,但NSMutableDictionary上的线程安全突变充当当前活动网络RPC请求的缓存。我的项目使用ARC。

I'm seeing an intermittent deadlock in my app when using dispatch_sync on a custom concurrent dispatch_queue. I'm using something similar to the method described in Mike Ash's blog to support concurrent read access but threadsafe mutations on an NSMutableDictionary that acts as a cache of currently active network RPC requests. My project uses ARC.

我使用以下命令创建队列:

I create the queue with:

dispatch_queue_t activeRequestsQueue = dispatch_queue_create("my.queue.name",
                                                DISPATCH_QUEUE_CONCURRENT);

和可变字典

NSMutableDictionary *activeRequests = [[NSMutable dictionary alloc] init];

我从队列中读取元素如下:

I read elements from the queue like this:

- (id)activeRequestForRpc: (RpcRequest *)rpc
{
    assert(![NSThread isMainThread]);
    NSString * key = [rpc getKey];
    __block id obj = nil;
    dispatch_sync(activeRequestsQueue, ^{
        obj = [activeRequests objectForKey: key];
    });
    return obj;
}

我在缓存中添加和删除rpcs

I add and remove rpcs from the cache

- (void)addActiveRequest: (RpcRequest *)rpc
{
    NSString * key = [rpc getKey];
    dispatch_barrier_async(activeRequestsQueue, ^{
        [activeRequests setObject: rpc forKey: key];
    });
}

- (void)removeActiveRequest: (RpcRequest *)rpc
{
    NSString * key = [rpc getKey];
    dispatch_barrier_async(activeRequestsQueue, ^{
        [activeRequests removeObjectForKey:key];
    });
}

当我做了很多工作时,我看到了对activeRequestForRpc调用的死锁网络请求立即导致我认为其中一个障碍块(添加或删除)未完成执行。我总是从后台线程调用activeRequestForRpc,并且应用程序UI不会冻结,因此我认为不必阻塞主线程,但我添加了assert语句以防万一。关于如何发生这种僵局的任何想法?

I'm seeing the deadlock in the call to activeRequestForRpc when I make a lot of network requests at once which leads me to believe that one of the barrier blocks (add or remove) is not completing execution. I always call activeRequestForRpc from a background thread, and the app UI doesn't freeze so I don't think it has to do blocking the main thread, but I added the assert statement just in case. Any ideas on how this deadlock could be happening?

更新:添加调用这些方法的代码

我正在使用AFNetworking来发出网络请求,我有一个NSOperationQueue,我正在安排'检查缓存,也许可以从网络中获取资源'逻辑。我将其称为CheckCacheAndFetchFromNetworkOp。在该操作中,我调用AFHTTPClient的自定义子类来发出RPC请求。

I'm using AFNetworking to make the network requests and I have an NSOperationQueue that I'm scheduling the 'check cache and maybe fetch resource from network' logic. I'll call that op the CheckCacheAndFetchFromNetworkOp. Inside that op I make a call out to my custom subclass of AFHTTPClient to make an RPC request.

// this is called from inside an NSOperation executing on an NSOperationQueue.
- (void) enqueueOperation: (MY_AFHTTPRequestOperation *) op {
    NSError *error = nil;
    if ([self activeRequestForRpc:op.netRequest.rpcRequest]) {
        error = [NSError errorWithDomain:kHttpRpcErrorDomain code:HttpRpcErrorDuplicate userInfo:nil];
    }
    // set the error on the op and cancels it so dependent ops can continue.
    [op setHttpRpcError:error];

    // Maybe enqueue the op
    if (!error) {
        [self addActiveRequest:op.netRequest.rpcRequest];
        [self enqueueHTTPRequestOperation:op];
    }
}

MY_AFHTTRequestOperation由AFHTTPClient实例构建,内部均为成功和失败完成块我调用 [self removeActiveRequest:netRequest.rpcRequest]; 作为第一个操作。 AFNetworking将这些块作为默认行为在主线程上执行。

The MY_AFHTTRequestOperation is built by the AFHTTPClient instance and inside both the success and failure completion blocks I call [self removeActiveRequest:netRequest.rpcRequest]; as the first action. These blocks are executed on the main thread by AFNetworking as the default behavior.

我看到死锁发生在必须持有锁定的最后一个障碍块的位置queue既是add块又是remove块。

I've seen the deadlock happen where the last barrier block that must be holding the lock on the queue is both the add block and the remove block.

当我的NSOperationQueue系统产生更多线程来支持CheckCacheAndFetchFromNetworkOp Ops时,activeRequestsQueue也是可能的优先获得预定?如果所有线程都被CheckCacheAndFetchFromNetworkOps阻塞以尝试从activeRequests字典中读取,并且activeRequestsQueue在无法执行的添加/删除障碍块上阻塞,则可能导致死锁。

Is it possible that as the system spawns more threads to support the CheckCacheAndFetchFromNetworkOp Ops in my NSOperationQueue, the activeRequestsQueue would be too low priority to get scheduled? That could cause deadlock if all threads were taken by CheckCacheAndFetchFromNetworkOps blocking to try and read from the activeRequests Dictionary, and the activeRequestsQueue was blocking on an add/remove barrier block that couldn't execute.

更新

通过将NSOperationQueue设置为maxConcurrentOperation计数为1(或者除了默认的NSOperationQueueDefaultMaxConcurrentOperationCount之外的任何其他合理数据)来解决此问题)。

Fixed the issue by setting the NSOperationQueue to have maxConcurrentOperation count of 1 (or really anything reasonable other than the default NSOperationQueueDefaultMaxConcurrentOperationCount).

基本上,我带走的教训是你不应该有任何其他dispatch_queue_t或NSOperationQueue的默认最大操作数等待的NSOperationQueue,因为它可能潜伏来自其他队列的所有线程。

Basically the lesson I took away is that you shouldn't have an NSOperationQueue with the default max operation count wait on any other dispatch_queue_t or NSOperationQueue since it could potentially hog all threads from those other queues.

这就是发生的事情。

队列 - NSOperationQueue设置为默认NSDefaultMaxOperationCount,它允许系统确定要运行的并发操作数。

queue - NSOperationQueue set to default NSDefaultMaxOperationCount which lets system determine how many concurrent ops to run.

op - 在queue1上运行并在读取后在AFNetworking队列上调度网络请求,以确保RPC不在activeRequest集中。

op - runs on queue1 and schedules a network request on the AFNetworking queue after reading to make sure the RPC isn't in the activeRequest set.

这是流程:

系统确定它可以支持10个并发线程(实际上它更像80)。

The system determines that it can support 10 concurrent threads (In reality it was more like 80).

10次操作立即安排。系统允许10个操作同时在其10个线程上运行。所有10个操作都调用hasActiveRequestForRPC,它调度activeRequestQueue上的同步块并阻塞10个线程。 activeRequestQueue想要运行它的读取块,但没有任何可用的线程。此时我们已经陷入僵局。

10 ops get scheduled at once. The system lets 10 ops run concurrently on it's 10 threads. All 10 ops call hasActiveRequestForRPC which schedules a sync block on the activeRequestQueue and blocks the 10 threads. The activeRequestQueue wants to run it's read block, but doesn't have any available threads. At this point we already have a deadlock.

更常见的是,我会看到9操作(1-9)被调度,其中一个,op1,在第10个线程上快速运行hasActiveRequestForRPC并安排addActiveRequest barrer块。然后另一个操作将在第10个线程上进行调度,并且op2-10将调度并等待hasActiveRequestForRPC。然后op1的预定addRpc块将不会运行,因为op10占用了最后一个可用线程,而所有其他hasActiveRequestForRpc块将等待屏障块执行。当op1试图在另一个也无法访问任何线程的操作队列上安排缓存操作时,op1最终会阻塞。

More commonly I would see something like 9 ops (1-9) get scheduled, one of them, op1, quickly runs a hasActiveRequestForRPC on the 10th thread and schedules an addActiveRequest barrer block. Then another op would get scheduled on the 10th thread and the op2-10 would schedule and wait on an hasActiveRequestForRPC. Then the op1's scheduled addRpc block wouldn't run since the op10 took up the last available thread, and all the other hasActiveRequestForRpc blocks would wait for the barrier block to execute. op1 would end up blocking later when it tried to schedule a cache operation on a different operation queue that also couldn't get access to any threads.

我假设阻塞hasActiveRequestForRPC正在等待barrer块执行,但关键是activeRequestQueue正在等待任何线程可用性。

I was assuming that the blocking hasActiveRequestForRPC were waiting on a barrer block to execute, but the key was the activeRequestQueue waiting on any thread availability.

推荐答案

编辑:原来问题是NSOperationQueue正在调用 enqueueOperation:正在使用所有可用线程,因为它们都在等待(通过dispatch_sync)在 activeRequestsQueue 上发生的事情。减少此队列上的maxConcurrentOperations解决了这个问题(请参阅注释),虽然这不是一个很好的解决方案,因为它会假设核心数量等。更好的解决方案是使用 dispatch_async 而不是 dispatch_sync ,但这会使代码更复杂。

Turns out the problem was that the NSOperationQueue which is calling enqueueOperation: is using all available threads, so since they are all waiting (via dispatch_sync) for something to happen on the activeRequestsQueue. Reducing the maxConcurrentOperations on this queue solved the problem (see comments), though this is not really a great solution because it makes assumptions about the number of cores, etc. A better solution would be to use dispatch_async rather than dispatch_sync, though this will make the code more complex.

我之前的建议:


  • 您正在调用 dispatch_sync(activeRequestsQueue,...)当你已经在activeRequestsQueue上时(并且你的断言由于某种原因没有触发,就像你在发布中运行一样。)

  • You're calling dispatch_sync(activeRequestsQueue, ...) when you're already on the activeRequestsQueue (and your assert isn't firing for some reason, like you're running in release.)

[activeRequests removeObjectForKey:key]; 导致请求被释放,dealloc正在等待调用 activeRequestForRpc:的内容,这将导致死锁。

[activeRequests removeObjectForKey:key]; is causing a request to be deallocated, and the dealloc is waiting for something that calls activeRequestForRpc:, which would cause a deadlock.

这篇关于为什么dispatch_sync在自定义并发队列死锁上的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆