为什么 CRITICAL_SECTION 性能在 Win8 上变差了 [英] Why did CRITICAL_SECTION performance become worse on Win8

查看：31 发布时间：2021/9/25 19:00:11 c++ c++11 winapi critical-section stdmutex

本文介绍了为什么 CRITICAL_SECTION 性能在 Win8 上变差了的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

似乎 CRITICAL_SECTION 性能在 Windows 8 及更高版本上变得更糟.(见下图)

It seems like CRITICAL_SECTION performance became worse on Windows 8 and higher. (see graphs below)

测试非常简单:一些并发线程每个线程执行 300 万个锁以独占访问一个变量.您可以在问题底部找到 C++ 程序.我在 Windows Vista、Windows 7、Windows 8、Windows 10(x64、VMWare、Intel Core i7-2600 3.40GHz)上运行测试.

The test is pretty simple: some concurrent threads do 3 million locks each to access a variable exclusively. You can find the C++ program at the bottom of the question. I run the test on Windows Vista, Windows 7, Windows 8, Windows 10 (x64, VMWare, Intel Core i7-2600 3.40GHz).

结果如下图所示.X 轴是并发线程数.Y 轴是以秒为单位的经过时间(越低越好).

The results are on the image below. The X-axis is the number of concurrent threads. The Y-axis is the elapsed time in seconds (lower is better).

我们能看到的:

SRWLock 性能在所有平台上大致相同
CriticalSection 在 Windows 8 及更高版本上相对 SRWL 的性能变得更差

SRWLock performance is approximately the same for all platforms
CriticalSection performance became worse relatively SRWL on Windows 8 and higher

问题是:谁能解释一下为什么 CRITICAL_SECTION 性能在 Win8 及更高版本上变得更糟?

The question is: Can anybody please explain why did CRITICAL_SECTION performance become worse on Win8 and higher?

一些注意事项:

在真机上的结果几乎相同 - CS 比 Win8 及更高版本上的 std::mutex、std::recursive_mutex 和 SRWL 差得多.但是我没有机会在具有相同 CPU 的不同操作系统上运行测试.
std::mutex 实现基于 CRITICAL_SECTION，但 Win7 及更高版本的 std::mutex 实现基于 SWRL.它对 MSVS17 和 15 都是正确的(确保在 MSVC++ 安装时搜索 primitives.h 文件并查找 stl_critical_section_vista 和 stl_critical_section_win7 类)这解释了 Win Vista 和其他平台上 std::mutex 性能之间的差异.
正如评论中所说，std::mutex 是一个包装器，因此相对于 SRWL 的一些开销的可能解释可能是包装器代码引入的开销.

The results on real machines are pretty the same - CS is much worse than both std::mutex, std::recursive_mutex and SRWL on Win8 and higher. However I have no chance to run the test on different OSes with the same CPU.
std::mutex implementation for Windows Vista is based on CRITICAL_SECTION, but for Win7 and higher std::mutex is based on SWRL. It is correct for both MSVS17 and 15 (To make sure search for primitives.h file at MSVC++ installation and look for stl_critical_section_vista and stl_critical_section_win7 classes) This explains the difference between std::mutex performance on Win Vista and others.
As it is said in comments, the std::mutex is a wrapper, so the possible explanation for some overhead relatively SRWL may be overhead introduced by the wrapper code.

#include <chrono>
#include <iostream>
#include <mutex>
#include <string>
#include <thread>
#include <vector>

#include <Windows.h>

const size_t T = 10;
const size_t N = 3000000;
volatile uint64_t var = 0;

const std::string sep = ";";

namespace WinApi
{
    class CriticalSection
    {
        CRITICAL_SECTION cs;
    public:
        CriticalSection() { InitializeCriticalSection(&cs); }
        ~CriticalSection() { DeleteCriticalSection(&cs); }
        void lock() { EnterCriticalSection(&cs); }
        void unlock() { LeaveCriticalSection(&cs); }
    };

    class SRWLock
    {
        SRWLOCK srw;
    public:
        SRWLock() { InitializeSRWLock(&srw); }
        void lock() { AcquireSRWLockExclusive(&srw); }
        void unlock() { ReleaseSRWLockExclusive(&srw); }
    };
}

template <class M>
void doLock(void *param)
{
    M &m = *static_cast<M*>(param);
    for (size_t n = 0; n < N; ++n)
    {
        m.lock();
        var += std::rand();
        m.unlock();
    }
}

template <class M>
void runTest(size_t threadCount)
{
    M m;
    std::vector<std::thread> thrs(threadCount);

    const auto start = std::chrono::system_clock::now();

    for (auto &t : thrs) t = std::thread(doLock<M>, &m);
    for (auto &t : thrs) t.join();

    const auto end = std::chrono::system_clock::now();

    const std::chrono::duration<double> diff = end - start;
    std::cout << diff.count() << sep;
}

template <class ...Args>
void runTests(size_t threadMax)
{
    {
        int dummy[] = { (std::cout << typeid(Args).name() << sep, 0)... };
        (void)dummy;
    }
    std::cout << std::endl;

    for (size_t n = 1; n <= threadMax; ++n)
    {
        {
            int dummy[] = { (runTest<Args>(n), 0)... };
            (void)dummy;
        }
        std::cout << std::endl;
    }
}

int main()
{
    std::srand(time(NULL));
    runTests<std::mutex, WinApi::CriticalSection, WinApi::SRWLock>(T);
    return 0;
}

测试项目是在 Microsoft Visual Studio 17 (15.8.2) 上构建为 Windows 控制台应用程序，具有以下设置:

The test project was built as Windows Console Application on Microsoft Visual Studio 17 (15.8.2) with the folowing settings:

MFC 的使用:在静态库中使用 MFC
Windows SDK 版本:10.0.17134.0
平台工具集:Visual Studio 2017 (v141)
优化:O2、Oi、Oy-、GL

为什么 CRITICAL_SECTION 性能在 Win8 上变差了 [英] Why did CRITICAL_SECTION performance become worse on Win8

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

为什么 CRITICAL_SECTION 性能在 Win8 上变差了 [英] Why did CRITICAL_SECTION performance become worse on Win8

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭