C ++升温标准向量 [英] C++ Warming std vector

查看:50
本文介绍了C ++升温标准向量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为什么第二次填充std :: vector更快?即使从开始就保留了空间?

Why filling std::vector second time is FASTER? Even if space was reserved from the beggining?

int total = 1000000;

struct BaseClass {
  float m[16];
  int id;

  BaseClass(int _id) { id = _id; }
};

int main() {

  std::vector<BaseClass> ar;
  ar.reserve(total);

  {
    auto t_start = std::chrono::high_resolution_clock::now();
    for (int var = 0; var < total; ++var) {
      ar.emplace_back(var);
    }
    auto t_end = std::chrono::high_resolution_clock::now();
    std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(
                     t_end - t_start).count() << "\n";
    ar.clear();
  }

  {
    auto t_start = std::chrono::high_resolution_clock::now();
    for (int var = 0; var < total; ++var) {
      ar.emplace_back(var);
    }
    auto t_end = std::chrono::high_resolution_clock::now();
    std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(
                     t_end - t_start).count() << "\n";
    ar.clear();
  }

  {
    auto t_start = std::chrono::high_resolution_clock::now();
    for (int var = 0; var < total; ++var) {
      ar.emplace_back(var);
    }
    auto t_end = std::chrono::high_resolution_clock::now();
    std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(
                     t_end - t_start).count() << "\n";
    ar.clear();
  }

  return 0;
}

在线预览: http://coliru.stacked-crooked.com/a/229e4ba47adddb1a

结果:
118
23
21

RESULTS:
118
23
21

P.S.我在问,如果向量变慢的唯一原因是分配/重新分配,为什么它变得更快.并且我们在开始之前分配了数组.

P.S. I'm asking why it becomes faster if the only reason for slowdown for vector is allocation/reallocation. And we allocated array BEFORE start.

推荐答案

第一次运行比其他两次慢的原因是运行时尚未从操作系统获取内存页面.

The reason that the first run is slower than the other two is that the runtime has not yet gotten the memory pages from the OS.

我检测了您的程序,以输出任务在开始的三个阶段中的开始和之后的主要页面和次要页面错误的数量.(注意:这在Linux上有效.不知道它是否可以在所用的任何操作系统上工作.)代码:

I instrumented your program to output the number of major and minor page faults the task had taken at the beginning, and after each of the three stages above. (Note: This works on Linux. No idea if it'll work on whatever OS you're on.) Code:

注意:已更新为最新版本,其中 reserve()移至顶部并包装在其自己的 getrusage 调用中./SUP>

Note: updated to latest, with reserve() moved to the top and wrapped in its own getrusage call.

#include <ctime>
#include <chrono>
#include <iostream>
#include <vector>

#include <sys/time.h>
#include <sys/resource.h>

using namespace std;

int total = 1000000;

struct BaseClass {
  float m[16];
  int id;

  BaseClass(int _id) { id = _id; }
};

int main() {

  std::vector<BaseClass> ar;
  struct rusage r;
  {
    auto t_start = std::chrono::high_resolution_clock::now();
     }

  getrusage(RUSAGE_SELF, &r);
  cout << "minflt: " << r.ru_minflt << " majflt: " << r.ru_majflt << endl;

  ar.reserve(total);

  getrusage(RUSAGE_SELF, &r);
  cout << "minflt: " << r.ru_minflt << " majflt: " << r.ru_majflt << endl;

  {
    auto t_start = std::chrono::high_resolution_clock::now();
    for (int var = 0; var < total; ++var) {
      ar.emplace_back(var);
    }
    auto t_end = std::chrono::high_resolution_clock::now();
    std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(
                     t_end - t_start).count() << "\n";
    ar.clear();
  }

  getrusage(RUSAGE_SELF, &r);
  cout << "minflt: " << r.ru_minflt << " majflt: " << r.ru_majflt << endl;

  {
    auto t_start = std::chrono::high_resolution_clock::now();
    for (int var = 0; var < total; ++var) {
      ar.emplace_back(var);
    }
    auto t_end = std::chrono::high_resolution_clock::now();
    std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(
                     t_end - t_start).count() << "\n";
    ar.clear();
  }

  getrusage(RUSAGE_SELF, &r);
  cout << "minflt: " << r.ru_minflt << " majflt: " << r.ru_majflt << endl;

  {
    auto t_start = std::chrono::high_resolution_clock::now();
    for (int var = 0; var < total; ++var) {
      ar.emplace_back(var);
    }
    auto t_end = std::chrono::high_resolution_clock::now();
    std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(
                     t_end - t_start).count() << "\n";
    ar.clear();
  }

  getrusage(RUSAGE_SELF, &r);
  cout << "minflt: " << r.ru_minflt << " majflt: " << r.ru_majflt << endl;

  return 0;
}

然后我在盒子上运行它.结果令人鼓舞:

I then ran it on my box. The result is enlightening:

minflt: 343 majflt: 0
minflt: 367 majflt: 0
48    minflt: 16968 majflt: 0
16
minflt: 16968 majflt: 0
15
minflt: 16968 majflt: 0

请注意,第一个测得的for-loop发生了16,000多个小故障.这些故障是导致内存可供应用程序使用并导致运行时间变慢的原因.此后没有其他故障发生.相比之下, reserve()自身仅引起24个小故障.

Notice that the first measured for-loop incurred over 16,000 minor faults. Those faults are what make the memory available to the application and account for the slower running time. No additional faults happen thereafter. In contrast, the reserve() call itself only incurred 24 minor faults.

在大多数现代虚拟内存操作系统中,即使未运行在其上的软件,该操作系统也实现了惰性内存分配.当运行时从操作系统请求其他内存时,操作系统会记录该请求.如果请求成功,则运行时现在可以使用新范围的虚拟地址.(具体细节取决于调用的API和操作系统,但是本质是相同的.)操作系统可能会将虚拟地址范围指向标记为只读的单个零填充页面.

In most modern virtual-memory OSes, the OS implements lazy memory allocation, even if the software running on it does not. When the runtime requests additional memory from the OS, the OS makes a note of the request. If the request succeeds, the runtime now has a new range of virtual addresses available to it. (Details vary depending on the API called and the OS, but the essence is the same.) The OS may point the virtual address range to a single zero-filled page marked read-only.

操作系统不是不一定会使这些页面立即可用于任务.而是,OS等待直到任务实际尝试写入分配的内存.此时,操作系统会分配一个物理页面来支持分配给任务的虚拟页面.在UNIX中,这被注册为次要故障".这个过程可能很昂贵.

The OS does not necessarily make those pages immediately available to the task. Rather, the OS waits until the task actually tries to write to the memory it's allocated. At that point, the OS allocates a physical page to back the virtual page allocated to the task. That registers as a "minor fault" in UNIX parlance. That process can be expensive.

您的任务正在测量的是惰性分配.

It's that lazy allocation that your task is measuring.

为了证明这一点,我也对应用程序做了一个 strace .有意义的部分在下面.

To prove that, I did an strace of the application as well. The meaningful portion is below.

getrusage(RUSAGE_SELF, {ru_utime={0, 0}, ru_stime={0, 0}, ...}) = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fe3aa339000
write(1, "minflt: 328 majflt: 0\n", 22) = 22
mmap(NULL, 68001792, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fe3a551c000
getrusage(RUSAGE_SELF, {ru_utime={0, 0}, ru_stime={0, 0}, ...}) = 0
write(1, "minflt: 352 majflt: 0\n", 22) = 22
write(1, "52\n", 3)                     = 3
getrusage(RUSAGE_SELF, {ru_utime={0, 30000}, ru_stime={0, 20000}, ...}) = 0
write(1, "minflt: 16953 majflt: 0\n", 24) = 24
write(1, "20\n", 3)                     = 3
getrusage(RUSAGE_SELF, {ru_utime={0, 50000}, ru_stime={0, 20000}, ...}) = 0
write(1, "minflt: 16953 majflt: 0\n", 24) = 24
write(1, "15\n", 3)                     = 3
getrusage(RUSAGE_SELF, {ru_utime={0, 70000}, ru_stime={0, 20000}, ...}) = 0
write(1, "minflt: 16953 majflt: 0\n", 24) = 24
munmap(0x7fe3a551c000, 68001792)        = 0
exit_group(0)                           = ?

如您所见,任务在前两个 getrusage 系统调用之间通过 mmap 调用分配了内存.但是,这一步骤仅引起了24个小故障.因此,即使C ++不是很懒惰,Linux还是懒于为任务分配内存.

As you can see, the task allocated memory with an mmap call between the first two getrusage system calls. And yet, that step only incurred 24 minor faults. So, even though C++ was not being lazy, Linux was being lazy about giving the memory to the task.

具体来说,第一个 mmap 调用似乎是为第一个 write 消息分配一个I/O缓冲区.第二个 mmap 调用(分配68001792字节)发生在第二个 getrusage 调用之前.但是,在此运行过程中,您仅看到两个之间又发生了24个其他故障.

Specifically, the first mmap call appears to allocate an I/O buffer for the first write mesage. The second mmap call (allocating 68001792 bytes) happens before the second getrusage call. And yet, you can see only 24 additional faults occurred between the two on this run.

你们当中鹰眼的人会发现,这次奔跑的数字与我上面显示的数字略有不同.我已经多次运行此可执行文件,并且每次的数字都略有变化.但是,他们总是在同一个球场上.

The hawk-eyed among you will notice the numbers are slightly different for this run than the numbers I showed above. I've run this executable many times, and the numbers shift slightly each time. But, they're always in the same general ballpark.

这篇关于C ++升温标准向量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆