mem_load_uops_retired.l3_miss和offcore_response.demand_data_rd.l3_miss.local_dram事件之间的区别 [英] Difference Between mem_load_uops_retired.l3_miss and offcore_response.demand_data_rd.l3_miss.local_dram Events

查看:100
本文介绍了mem_load_uops_retired.l3_miss和offcore_response.demand_data_rd.l3_miss.local_dram事件之间的区别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 Intel(R)Core(TM)i7-4720HQ CPU @ 2.60GHz ( Haswell )处理器.AFAIK, mem_load_uops_retired.l3_miss ,对 DRAM 需求(即, non-prefetch )数据读取访问次数进行计数.顾名思义, offcore_response.demand_data_rd.l3_miss.local_dram 计算针对DRAM的 demand 数据读取的数量.因此,这两个事件似乎是等效(或至少几乎相同).但是根据以下基准,前者的发生频率比后者少:

I have an Intel(R) Core(TM) i7-4720HQ CPU @ 2.60GHz (Haswell) processor. AFAIK, mem_load_uops_retired.l3_miss, counts the number of DRAM demand (i.e., non-prefetch) data read accesses. offcore_response.demand_data_rd.l3_miss.local_dram, as its name suggests, counts the number of demand data reads targeted to DRAM. Therefore, these two events seem to be equivalent (or at least almost the same). But based on the following benchmarks the former event is much less frequent than the latter:

1)在 C :

Performance counter stats for '/home/ahmad/Simple Progs/loop':

         1,363      mem_load_uops_retired.l3_miss                                   
         1,543      offcore_response.demand_data_rd.l3_miss.local_dram                                   

   0.000749574 seconds time elapsed

   0.000778000 seconds user
   0.000000000 seconds sys

2)在Evince中打开PDF文档:

Performance counter stats for '/opt/evince-3.28.4/bin/evince':

       936,152      mem_load_uops_retired.l3_miss                                   
     1,853,998      offcore_response.demand_data_rd.l3_miss.local_dram                                   

   4.346408203 seconds time elapsed

   1.644826000 seconds user
   0.103411000 seconds sys

3)运行Wireshark 5秒钟:

Performance counter stats for 'wireshark':

     5,161,671      mem_load_uops_retired.l3_miss                                   
     8,126,526      offcore_response.demand_data_rd.l3_miss.local_dram                                   

  15.713828395 seconds time elapsed

   0.904280000 seconds user
   0.693906000 seconds sys

4)在Inkscape中的图像上运行模糊滤镜:

Performance counter stats for 'inkscape':

    13,852,121      mem_load_uops_retired.l3_miss                                   
    23,475,970      offcore_response.demand_data_rd.l3_miss.local_dram                                   

  25.355643897 seconds time elapsed

   7.244404000 seconds user
   1.019895000 seconds sys

所有四个基准测试中, offcore_response.demand_data_rd.l3_miss.local_dram 几乎是两倍,是 mem_load_uops_retired.l3_miss 代码>.这是合理吗?为什么?请告诉我基准测试是否太复杂粗粒度

In all four benchmarks, offcore_response.demand_data_rd.l3_miss.local_dram is nearly twice as frequent as mem_load_uops_retired.l3_miss. Is this reasonable? Why? Please, tell me if the benchmarks are too complicated and coarse-grained!

推荐答案

下表根据我(目前)的知识,显示了Haswell上这两个事件之间的区别:

The following table shows the differences between these two events on Haswell to the best of my (current) knowledge:

<身体>
mem_load_uops_retired.l3_miss offcore_response.demand _data_rd.l3_miss.local_dram
可缓存的退休负载单位每行每行 Y
可缓存的非退休负载Uops N Y
不可缓存的WC退休负载Uops 每行一个事件 N
不可缓存的UC退休负载Uops 可能会发生 N
不可缓存的WC或UC非退休负载Uops N N
任何类型的锁定加载到任何内存类型可能会发生我不知道
旧版IO请求可能会发生 N
L1D预取 N Y
L2预取到L2或L3 N N
软件预取,无意写 N Y
页面遍历加载 N Y
服务单位任何本地DRAM
可靠性可能不可靠可靠

现在应该很清楚,这些事件通常根本不相等.同样,比较这两个事件的计数以推断出有意义的内容也不是一件容易的事.

It should be clear to you now that these events, in general, are not equivalent at all. Also comparing the counts of these two events to deduce something meaningful is not an easy task.

在您呈现的所有示例中, offcore_response.demand_data_rd.l3_miss.local_dram 事件计数大于 mem_load_uops_retired.l3_miss 事件计数.但是,要想出一个实际的例子并不难,后者比前者要大.

In all of the examples you presented, the offcore_response.demand_data_rd.l3_miss.local_dram event count is larger than the mem_load_uops_retired.l3_miss event count. However, it's not hard to come up with real examples where the latter is larger than the former.

在所有四个基准测试中,offcore_response.demand_data_rd.l3_miss.local_dram几乎是原来的两倍频繁出现为mem_load_uops_retired.l3_miss.这合理吗?

In all four benchmarks, offcore_response.demand_data_rd.l3_miss.local_dram is nearly twice as frequent as mem_load_uops_retired.l3_miss. Is this reasonable?

我认为描述是近两次".实际上仅适用于第二个示例,而不适用于其他示例.在看不到确切的代码和执行环境信息的情况下,我无法对您显示的数字发表评论.

I think the description "nearly twice" really only applies to the second example, but not the others. I can't comment on the numbers you've shown without seeing the exact code and execution environment information.

这篇关于mem_load_uops_retired.l3_miss和offcore_response.demand_data_rd.l3_miss.local_dram事件之间的区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆