内存初始化的Linux内核的高CPU使用率 [英] linux high kernel cpu usage on memory initialization

查看:939
本文介绍了内存初始化的Linux内核的高CPU使用率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我与Linux内核的高CPU cunsumption一个问题,而在服务器自举我的Java应用程序。只有在生产中发生此问题,开发服务器上的一切都是光速。

I have a problem with high CPU cunsumption by the linux kernel, while bootstrapping my java applications on server. This problem occurs only in production, on dev servers everything is light-speed.

upd9:有这个问题上有两个问题:

upd9: There was two question about this issue:


  1. 如何解决呢? - 标称动物建议同步和放下一切,它确实有帮助。 须藤SH -c'同步;回声3>的/ proc / SYS / VM / drop_caches; 作品。 upd12:不过说实在的同步足够

  1. How to fix it? - Nominal Animal suggested to sync and drop everything, and it really helps. sudo sh -c 'sync ; echo 3 > /proc/sys/vm/drop_caches ; Works. upd12: But indeed sync is enough.

为什么发生这种情况 - 它仍然是为我敞开,我也明白,冲洗的Durty页到磁盘消耗内核CPU和IO的时间,这是正常的。 但是,什么是strage,为什么连单线程应用程序编写的C我的负荷在内核空间的所有核心由100%?

Why this happening? - It is still open for me, I do understand that flushing durty pages to disk consumes Kernel CPU and IO time, it is normal. But what is strage, why even single threaded application written in "C" my load ALL cores by 100% in kernel space?

由于REF- upd10 并REF- upd11 我有一个想法,呼应3 GT;的/ proc / SYS / VM / drop_caches 不需要修复我的慢内存分配的问题。
它应该是足够运行'同步'的开始占用内存的应用程序。
可能会尝试这个的明天这里制作和后期结果。

Due to ref-upd10 and ref-upd11 I have an idea that echo 3 > /proc/sys/vm/drop_caches is not required to fix my problem with slow memory allocation. It should be enough to run `sync' before starting memory-consuming application. Probably will try this tommorow in production and post results here.

upd10:忘记FS的缓存页面情况:

upd10: Lost of FS caches pages case:


  1. 我执行猫10GB.fiel>的/ dev / null的,然后

  2. 同步可以肯定的,没有任何的Durty页面(执行cat / proc / meminfo的| grep的^脏显示184KB。

  3. 检查执行cat / proc / meminfo的| grep的^缓存我:4GB缓存

  4. 运行 INT主(字符**)我有正常的表现(如50毫秒来初始化分配的数据32MB)。

  5. 缓存内存减少到900MB

  6. 测试总结:我认为这是为Linux回收用作FS缓存到分配的内存页没有问题

  1. I executed cat 10GB.fiel > /dev/null, then
  2. sync to be sure, no durty pages (cat /proc/meminfo |grep ^Dirty displayed 184kb.
  3. Checking cat /proc/meminfo |grep ^Cached I got: 4GB cached
  4. Running int main(char**) I got normal performance (like 50ms to initialize 32MB of allocated data).
  5. Cached memory reduced to 900MB
  6. Test summary: I think it is no problem for linux to reclaim pages used as FS cache into allocated memory.

upd11:脏页的情况下地段


  1. 项目

  1. List item

我跑我的 HowMongoDdWorks 例如注释部分,经过一段时间

I run my HowMongoDdWorks example with commented read part, and after some time

的/ proc / meminfo中表示,2.8GB为肮脏和3.6GB为缓存

/proc/meminfo said that 2.8GB is Dirty and a 3.6GB is Cached.

我停下 HowMongoDdWorks 和运行我的 INT主(字符**)

下面是结果的一部分:

15初始化时间0.00S
X 0 [1试试/一部分0]时间1.11s
X 1 [2试试/一部分0]时间0.04秒
X 0 [1试试/部分1]时间1.04s
X 1 [2试试/部分1]时间0.05秒
X 0 [1试试/第2部分]时间0.42s
X 1 [2试试/第2部分]时间0.04秒

init 15, time 0.00s x 0 [try 1/part 0] time 1.11s x 1 [try 2/part 0] time 0.04s x 0 [try 1/part 1] time 1.04s x 1 [try 2/part 1] time 0.05s x 0 [try 1/part 2] time 0.42s x 1 [try 2/part 2] time 0.04s

总结测试:失去的Durty页来分配内存第一次访问显著减缓(公平地说,这开始发生,只有当应用程序总内存开始可以媲美整个操作系统的内存,也就是说,如果你有16个8 GB的免费的,它是分配1GB,3GB从左右放缓starst)没有问题。

现在我设法重现这种情况在我的开发环境,所以这里是新的细节。

Now I managed to reproduce this situation in my dev environment, so here is new details.

开发机器配置:


  1. 的Linux 2.6.32-220.13.1.el6.x86_64 - 科学的Linux 6.1版(黑色)

  2. RAM:15.55 GB

  3. CPU:1 X英特尔(R)酷睿(TM)i5-2300 CPU @ 2.80GHz的(4个线程)(物理)

这是99.9%,造成大量的FS缓存的Durty页面的这个问题。这里是创造大量的脏页的应用程序:

It's 99.9% that problem caused by large amount of durty pages in FS cache. Here is the application which creates lots on dirty pages:

import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.RandomAccessFile;
import java.util.Random;

/**
 * @author dmitry.mamonov
 *         Created: 10/2/12 2:53 PM
 */
public class HowMongoDdWorks{
    public static void main(String[] args) throws IOException {
        final long length = 10L*1024L*1024L*1024L;
        final int pageSize = 4*1024;
        final int lengthPages = (int) (length/pageSize);
        final byte[] buffer = new byte[pageSize];
        final Random random = new Random();
        System.out.println("Init file");
        final RandomAccessFile raf = new RandomAccessFile("random.file","rw");
        raf.setLength(length);
        int written = 0;
        int readed = 0;
        System.out.println("Test started");
        while(true){
            { //write.
                random.nextBytes(buffer);
                final long randomPageLocation = (long)random.nextInt(lengthPages)*(long)pageSize;
                raf.seek(randomPageLocation);
                raf.write(buffer);
                written++;
            }
            { //read.
                random.nextBytes(buffer);
                final long randomPageLocation = (long)random.nextInt(lengthPages)*(long)pageSize;
                raf.seek(randomPageLocation);
                raf.read(buffer);
                readed++;
            }
            if (written % 1024==0 || readed%1024==0){
                System.out.printf("W %10d R %10d pages\n", written, readed);
            }

        }
    }
}

这里是测试应用程序,这将导致HI(由所有内核高达100%),在内核空间的CPU负载(下同,但我会再次复制)。

And here is test application, which causes HI (up to 100% by all cores) CPU load in Kernel Space (same as below, but I will copy it once again).

#include<stdlib.h>
#include<stdio.h>
#include<time.h>

int main(char** argv){
   int last = clock(); //remember the time
   for(int i=0;i<16;i++){ //repeat test several times
      int size = 256 * 1024 * 1024;
      int size4=size/4;
      int* buffer = malloc(size); //allocate 256MB of memory
      for(int k=0;k<2;k++){ //initialize allocated memory twice
          for(int j=0;j<size4;j++){ 
              //memory initialization (if I skip this step my test ends in 
              buffer[j]=k; 0.000s
          }
          //printing 
          printf(x "[%d] %.2f\n",k+1, (clock()-last)/(double)CLOCKS_PER_SEC); stat
          last = clock();
      }
   }
   return 0;
}

虽然previous HowMongoDdWorks 程序正在运行, INT主(字符** argv的)将显示结果像这样的:

While previous HowMongoDdWorks program is running, int main(char** argv) will show results like this:

x [1] 0.23
x [2] 0.19
x [1] 0.24
x [2] 0.19
x [1] 1.30 -- first initialization takes significantly longer
x [2] 0.19 -- then seconds one (6x times slowew)
x [1] 10.94 -- and some times it is 50x slower!!!
x [2] 0.19
x [1] 1.10
x [2] 0.21
x [1] 1.52
x [2] 0.19
x [1] 0.94
x [2] 0.21
x [1] 2.36
x [2] 0.20
x [1] 3.20
x [2] 0.20 -- and the results is totally unstable
...

我保留此行只是由于历史perpose下的所有内容。

upd1 :两者,开发和生产系统是本次测试的大到足够多。
upd7 :它不是分页,至少在问题的时间我没有看到任何存储IO活动

upd1: both, dev and production systems are big enought for this test. upd7: it is not paging, at least I have not seen any storage IO activity during problem time.


  1. 开发〜4个核心,16通用RAM,8〜GB可用

  2. 制作〜12个核心,24 GB
    RAM,〜16 GB可用(8日至10通用汽车正在FS缓存,但不
    即使所有16GM是完全免费的区别,同样的结果),也是本机由CPU加载,但不能太高〜10%。

upd8(参考):新的测试用例,并与potentional解释见尾

upd8(ref): New test case and and potentional explanation see in tail.

下面是我的测试案例(我还测试Java和Python,但C应该是最清楚的):

Here is my test case (I also tested java and python, but "c" should be most clear):

#include<stdlib.h>
#include<stdio.h>
#include<time.h>

int main(char** argv){
   int last = clock(); //remember the time
   for(int i=0;i<16;i++){ //repeat test several times
      int size = 256 * 1024 * 1024;
      int size4=size/4;
      int* buffer = malloc(size); //allocate 256MB of memory
      for(int k=0;k<2;k++){ //initialize allocated memory twice
          for(int j=0;j<size4;j++){ 
              //memory initialization (if I skip this step my test ends in 
              buffer[j]=k; 0.000s
          }
          //printing 
          printf(x "[%d] %.2f\n",k+1, (clock()-last)/(double)CLOCKS_PER_SEC); stat
          last = clock();
      }
   }
   return 0;
}

开发计算机上的输出(部分):

The output on dev machine (partial):

x [1] 0.13 --first initialization takes a bit longer
x [2] 0.12 --then second one, but the different is not significant.
x [1] 0.13
x [2] 0.12
x [1] 0.15
x [2] 0.11
x [1] 0.14
x [2] 0.12
x [1] 0.14
x [2] 0.12
x [1] 0.13
x [2] 0.12
x [1] 0.14
x [2] 0.11
x [1] 0.14
x [2] 0.12 -- and the results is quite stable
...

生产机器上的输出(部分):

The output on production machine (partial):

x [1] 0.23
x [2] 0.19
x [1] 0.24
x [2] 0.19
x [1] 1.30 -- first initialization takes significantly longer
x [2] 0.19 -- then seconds one (6x times slowew)
x [1] 10.94 -- and some times it is 50x slower!!!
x [2] 0.19
x [1] 1.10
x [2] 0.21
x [1] 1.52
x [2] 0.19
x [1] 0.94
x [2] 0.21
x [1] 2.36
x [2] 0.20
x [1] 3.20
x [2] 0.20 -- and the results is totally unstable
...

在运行开发机器上本次测试的CPU使用率甚至没有从gound升高,像所有的核心是HTOP小于5%的使用率。

While running this test on development machine the CPU usage is not even rised from gound, like all cores is less then 5% usage in htop.

但正在运行的生产机器,我看到了被所有核心100%的CPU使用率在这个测试(平均负荷上升到12个内核的机器上50%),以及它的所有内核时间。

But running this test on production machine I see up to 100% CPU usage by ALL cores (in average load rises up to 50% on 12 cores machine), and it's all Kernel time.

UPD2:所有的机器都有相同的CentOS Linux 2.6的安装,我与他们使用ssh工作

upd2: all machines has same centos linux 2.6 installed, I work with them using ssh.

upd3:答:这是不可能被交换,我的测试中还没有看到任何磁盘活动,以及大量的RAM也是免费的。 (也descriptin被更新)。 - 梅德9分钟前

upd3: A: It's unlikely to be swapping, haven't seen any disk activity during my test, and plenty of RAM is also free. (also, descriptin is updated). – Dmitry 9 mins ago

upd4: HTOP说由内核HI CPU使用率,最高可达100%的利用率由人核心(上PROD)

upd4: htop say HI CPU utilisation by Kernel, up to 100% utilisation by al cores (on prod).

upd5:不CPU利用率定下来初始化完成后?在我的简单测试 - 是的。对于实际应用中,它只是帮助阻止一切,开始一个新的程序(这是废话)。

upd5: does CPU utilization settle down after initialization completes? In my simple test - Yes. For real application, it is only helping to stop everything else to start a new program (which is nonsense).

我有两个问题:


  1. 为什么出现这种情况?

  1. Why this happening?

如何解决?

upd8:改进测试和解释

#include<stdlib.h>
#include<stdio.h>
#include<time.h>

int main(char** argv){
    const int partition = 8;
   int last = clock();
   for(int i=0;i<16;i++){
       int size = 256 * 1024 * 1024;
       int size4=size/4;
       int* buffer = malloc(size);
       buffer[0]=123;
       printf("init %d, time %.2fs\n",i, (clock()-last)/(double)CLOCKS_PER_SEC);
       last = clock();
       for(int p=0;p<partition;p++){
            for(int k=0;k<2;k++){
                for(int j=p*size4/partition;j<(p+1)*size4/partition;j++){
                    buffer[j]=k;
                }
                printf("x [try %d/part %d] time %.2fs\n",k+1, p, (clock()-last)/(double)CLOCKS_PER_SEC);
                last = clock();
            }
      }
   }
   return 0;
}

和结果如下:

init 15, time 0.00s -- malloc call takes nothing.
x [try 1/part 0] time 0.07s -- usually first try to fill buffer part with values is fast enough.
x [try 2/part 0] time 0.04s -- second try to fill buffer part with values is always fast.
x [try 1/part 1] time 0.17s
x [try 2/part 1] time 0.05s -- second try...
x [try 1/part 2] time 0.07s
x [try 2/part 2] time 0.05s -- second try...
x [try 1/part 3] time 0.07s
x [try 2/part 3] time 0.04s -- second try...
x [try 1/part 4] time 0.08s
x [try 2/part 4] time 0.04s -- second try...
x [try 1/part 5] time 0.39s -- BUT some times it takes significantly longer then average to fill part of allocated buffer with values.
x [try 2/part 5] time 0.05s -- second try...
x [try 1/part 6] time 0.35s
x [try 2/part 6] time 0.05s -- second try...
x [try 1/part 7] time 0.16s
x [try 2/part 7] time 0.04s -- second try...

事实我从这个测试的经验教训。

Facts I learned from this test.


  1. 内存分配本身就是快。

  2. 要分配的内存先访问速度快(所以它不是一个懒惰的缓冲区分配问题)。

  3. I(测试8)拆分分配的缓冲区成几部分。

  4. 而具有0值的填充每个缓冲器部分,然后用值1,印刷所消耗的时间。

  5. 第二缓冲部分填充总是很快。

  6. 弗斯特但缓冲部分填充总是有点慢,那么第二个填充(我相信一些额外的工作就完成了我的第一个页面访问内核)。

  7. 有时它需要显著再与第一时间值来填充缓冲部分。

我想建议anwser,似乎它帮助。我将重新检查和后期结果再次更新版本。

I tried suggested anwser and it seems it helped. I will recheck and post results again later.

貌似分配页面的Durty文件系统缓存的页面linux的地图,它需要大量的时间通过一个页面刷新到磁盘之一。但总的同步工程速度快,消除了问题。

Looks like linux maps allocated pages to durty file system cache pages, and it takes a lot of time to flush pages to disk one by one. But total sync works fast and eliminates problem.

推荐答案

运行

sudo sh -c 'sync ; echo 3 > /proc/sys/vm/drop_caches ; sync'

您的开发机器上。这是确保你的缓存是空的安全,非破坏性的方式。 (您的的运行上面的命令,即使你碰巧保存或在精确的同时写入到磁盘的丢失任何数据。这真的是安全的。)

on your dev machine. It is a safe, nondestructive way to make sure your caches are empty. (You will not lose any data by running the above command, even if you happen to save or write to disk at the exact same time. It really is safe.)

然后,确保你没有任何Java的东西运行,并重新运行上面的命令只是要确定。您可以检查,如果您有任何Java运行例如:

Then, make sure you don't have any Java stuff running, and re-run the above command just to be sure. You can check if you have any Java running for example

ps axu | sed -ne '/ sed -ne /d; /java/p'

这应该输出什么。如果是这样,先关闭您的Java的东西。

It should output nothing. If it does, close your Java stuff first.

现在,重新运行你的应用程序的测试。难道现在同样出现放缓你的dev的机器上,也?

Now, re-run your application test. Does the same slowdown now occur on your dev machine, too?

如果你愿意留下评论无论哪种方式,梅德,我很乐意进一步探讨的问题。

If you care to leave the comment either way, Dmitry, I'd be happy to explore the issue further.

编辑补充:我怀疑放缓确实发生,并且是由于Java本身所产生的大量启动延迟。这是一个非常普遍的问题,并基本上内置在Java中,其结构的结果。对于较大的应用程序,启动延迟往往是第二,一显著部分无论多么快的机器,仅仅是因为Java有加载和prepare类(主要是连续,也因此增加内核不会帮助)。

Edited to add: I suspect that the slowdown does occur, and is due to the large startup latency incurred by Java itself. It is a very common issue, and basically built-in to Java, a result of its architecture. For larger applications, startup latency is often a significant fraction of a second, no matter how fast the machine, simply because Java has to load and prepare the classes (mostly serially, too, so adding cores will not help).

在换句话说,我认为指责应该落在Java的,而不是Linux;恰恰相反,因为Linux设法减轻通过内核级缓存开发机器上的延迟 - 只因为你保持运行这些Java组件几乎所有的时间,因此内核知道对它们进行缓存

In other words, I believe the blame should fall on Java, not Linux; quite the opposite, since Linux manages to mitigate the latency on your development machine through kernel-level caching -- and that only because you keep running those Java components practically all the time, so the kernel knows to cache them.

编辑2:这将是非常有用的,看看哪些文件当你的应用程序启动您的Java环境访问。你可以做到这一点 strace的

Edit 2: It would be very useful to see which files your Java environment accesses when your application is started. You can do this with strace:

strace -f -o trace.log -q -tt -T -e trace=open COMMAND...

它创建文件,其中包含 trace.log的

的open()系统调用任何由<启动的进程中完成code>命令。要保存输出到 trace.PID 每个进程的命令开始,用

which creates file trace.log containing the open() syscalls done by any of the processes started by COMMAND.... To save the output to trace.PID for each process the COMMAND... starts, use

strace -f -o trace -ff -q -tt -T -e trace=open COMMAND...

输出在你的开发和督促安装比较会告诉你,如果他们是真正的等价物。其中之一可具有额外的或缺失的库,影响了启动时间。

Comparing the outputs on your dev and prod installations will tell you if they are truly equivalent. One of them may have extra or missing libraries, affecting the startup time.

在情况下安装旧系统分区合理充分,这可能是这些文件已经支离破碎,导致内核花更多的时间等待I / O完成。 (请注意,量I / O 的保持不变;只需要完成如果文件是零散的时间会延长),您可以使用命令

In case an installation is old and system partition is reasonably full, it is possible that those files have been fragmented, causing the kernel to spend more time waiting for I/O to complete. (Note that the amount of I/O stays the same; only the time it takes to complete will increase if the files are fragmented.) You can use command

LANG=C LC_ALL=C sed -ne 's|^[^"]* open("\(.*\)", O[^"]*$|\1|p' trace.* \
| LANG=C LC_ALL=C sed -ne 's|^[^"]* open("\(.*\)", O[^"]*$|\1|p' \
| LANG=C LC_ALL=C xargs -r -d '\n' filefrag \
| LANG=C LC_ALL=C awk '(NF > 3 && $NF == "found") { n[$(NF-2)]++ }
  END { for (i in n) printf "%d extents %d files\n", i, n[i] }' \
| sort -g

要检查您的应用程序所使用的文件是如何支离破碎的;它报告多少文件只使用一个,或一个以上的盘区。请注意,它不包括原来的可执行文件(命令),只有它访问文件。

to check how fragmented the files used by your application are; it reports how many files use only one, or more than one extents. Note that it does not include the original executable (COMMAND...), only the files it accesses.

如果你只是想获得由一个单一的命令来访问,你可以使用的文件碎片统计

If you just want to get the fragmentation statistics for files accessed by a single command, you can use

LANG=C LC_ALL=C strace -f -q -tt -T -e trace=open COMMAND... 2>&1 \
| LANG=C LC_ALL=C sed -ne 's|^[0-9:.]* open("\(.*\)", O[^"]*$|\1|p' \
| LANG=C LC_ALL=C xargs -r filefrag \
| LANG=C LC_ALL=C awk '(NF > 3 && $NF == "found") { n[$(NF-2)]++ }
  END { for (i in n) printf "%d extents %d files\n", i, n[i] }' \
| sort -g

如果问题不是由于缓存,那么我认为这是最有可能的两个安装并不是真正的等同。如果是这样,那么我会检查碎片。在这之后,我会做两个环境中完整跟踪(省略 -e跟踪=打开),看看究竟在何处的差异。

If the problem is not due to caching, then I think it is most likely that the two installations are not truly equivalent. If they are, then I'd check the fragmentation. After that, I'd do a full trace (omitting the -e trace=open) on both environments to see where exactly the differences are.

我相信我现在理解你的问题/情形。

I do believe I now understand your problem/situation.

在您的督促环境,内核页面缓存大多是脏的,即大多数缓存的东西是要被写入到磁盘的东西。

On your prod environment, kernel page cache is mostly dirty, i.e. most cached stuff is stuff that is going to be written to disk.

当您的应用程序分配新页面,内核只设置了页面的映射,它实际上并没有马上给物理RAM。这只是发生在第一次访问的每个页面。

When your application allocates new pages, the kernel only sets up the page mappings, it does not actually give physical RAM immediately. That only happens on the first access to each page.

在第一访问中,内核第一定位一个空闲页 - 典型地,它包含干净高速缓存的数据,从盘中读出,即东西页,但不能修改。然后,它清除为零,以避免进程之间的信息泄露。 (当使用,而不是直接 MMAP的malloc()等C库分配的设施,如,()家庭功能方面,库可以使用/再利用零部件的映射。虽然内核并清除网页为零,图书馆可能脏他们。使用的mmap()来获得匿名页面你让他们清零。)

On the first access, the kernel first locates a free page -- typically, a page that contains "clean" cached data, i.e. something read from the disk but not modified. Then, it clears it to zeros, to avoid information leaks between processes. (When using the C library allocation facilities like malloc() etc., instead of the direct mmap() family of functions, the library may use/reuse parts of the mapping. Although the kernel does clear the pages to zeros, the library may "dirty" them. Using mmap() to get anonymous pages you do get them zeroed out.)

如果内核没有手头合适的干净的页面,首先必须先清空一些最古老的脏页到磁盘。 (有在内核平齐页到磁盘里面的进程,并标记它们干净,但是,如果该服务器的负载是这样的网页上被连续弄脏时,通常希望具有大多脏页,而不是主要清洁页 - 服务器得到更多的工作的方式。不幸的是,它也意味着在第一页的访问延迟,而你现在遇到的增加。)

If the kernel does not have suitable clean pages at hand, it must first flush some of the oldest dirty pages to disk first. (There are processes inside the kernel that flush pages to disk, and mark them clean, but if the server load is such that pages are continuously dirtied, it is usually desirable to have mostly dirty pages instead of mostly clean pages -- the server gets more work done that way. Unfortunately, it does also mean an increase in the first page access latency, which you're now encountered.)

每个页面的sysconf(_SC_PAGESIZE)字节长,对齐。换句话说,指针 P 指向一个页面的开始,当且仅当((长)P%的sysconf(_SC_PAGESIZE))== 0 。大多数内核,我相信,你居然填充在大多数情况下,而不是单个页面的页面组,从而提高了首次访问的延迟时间(每个组的页)。

Each page is sysconf(_SC_PAGESIZE) bytes long, aligned. In other words, pointer p points to the start of a page if and only if ((long)p % sysconf(_SC_PAGESIZE)) == 0. Most kernels, I believe, do actually populate groups of pages in most cases instead of individual pages, thus increasing the latency of the first access (to each group of pages).

最后,可能有一些编译器优化了严重破坏你的标杆。我建议你​​写一个单独的源文件标杆的main(),并在一个单独的文件在每次迭代完成的实际工作。单独编译它们,而只是将它们链接在一起,以确保编译器不会重新安排时间函数WRT。实际完成工作。基本上,在 benchmark.c

Finally, there may be some compiler optimization that plays havoc with your benchmarking. I recommend you write a separate source file for a benchmarking main(), and the actual work done on each iteration in a separate file. Compile them separately, and just link them together, to make sure the compiler does not rearrange the time functions wrt. the actual work done. Basically, in benchmark.c:

#define _POSIX_C_SOURCE 200809L
#include <time.h>
#include <stdio.h>

/* in work.c, adjust as needed */
void work_init(void);      /* Optional, allocations etc. */
void work(long iteration); /* Completely up to you, including parameters */
void work_done(void);      /* Optional, deallocations etc. */

#define PRIMING    0
#define REPEATS  100

int main(void)
{
    double          wall_seconds[REPEATS];
    struct timespec wall_start, wall_stop;
    long            iteration;

    work_init();

    /* Priming: do you want caches hot? */
    for (iteration = 0L; iteration < PRIMING; iteration++)
        work(iteration);

    /* Timed iterations */
    for (iteration = 0L; iteration < REPEATS; iteration++) {
        clock_gettime(CLOCK_REALTIME, &wall_start);
        work(iteration);
        clock_gettime(CLOCK_REALTIME, &wall_stop);
        wall_seconds[iteration] = (double)(wall_stop.tv_sec - wall_start.tv_sec)
                                + (double)(wall_stop.tv_nsec - wall_start.tv_nsec) / 1000000000.0;
    }

    work_done();

    /* TODO: wall_seconds[0] is the first iteration.
     *       Comparing to successive iterations (assuming REPEATS > 0)
     *       tells you about the initial latency.
    */

    /* TODO: Sort wall_seconds, for easier statistics.
     *       Most reliable value is the median, with half of the
     *       values larger and half smaller.
     *       Personally, I like to discard first and last 15.85%
     *       of the results, to get "one-sigma confidence" interval.
    */

    return 0;
}

与在做实际的数组分配,回收和灌装(每个重复循环)工作()功能 work.c <定义/ code>。

with the actual array allocation, deallocation, and filling (per repeat loop) done in the work() functions defined in work.c.

这篇关于内存初始化的Linux内核的高CPU使用率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆