Haswell微架构在perf中没有Stalled-cycles-backend [英] Haswell microarchitecture don't have Stalled-cycles-backend in perf

查看:175
本文介绍了Haswell微架构在perf中没有Stalled-cycles-backend的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Haswell CPU(Intel Core i7-4790)上安装了perf.但是性能列表"不包括"stalled-cycles-frontend"或"stalled-cycles-backend".我检查了 http://www. intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html ,但未在表19-7中找到与停顿循环后端相关的性能事件(非建筑第四代英特尔酷睿处理器的处理器内核中的性能事件.

I installed perf on Haswell CPU( Intel Core i7-4790 ). But the "perf list" does not include "stalled-cycles-frontend" nor "stalled-cycles-backend". I checked the http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html and not found the performance events relevant to stalled-cycles-backend from the Table 19-7( Non-Architectural Performance Events In the Processor Core of 4th Generation Intel Core Processors).

所以我的问题是:如何在Haswell CPU内核中使用perf或其他工具测量停滞周期后端.内核是3.19,perf版本也是3.19.

So my question is: how can I measure stalled-cycles-backend using perf or other tools in Haswell CPU cores. The kernel is 3.19 and perf version is also 3.19.

谢谢

推荐答案

是的,内核的perf_events子系统中没有"stalled-cycles-frontend"和"stalled-cycles-backend"综合事件的映射. Ivy Bridge或Haswell等处理器.而且在较旧的Core 2上没有映射.很可能,此名称/概念/想法不太适合用于现代无序CPU的更改和复杂的微体系结构,而无需对全局停顿"进行简单的标量测量.

Yes, there is no mapping of "stalled-cycles-frontend" and "stalled-cycles-backend" synthetic events in perf_events subsystem in kernel for newer processors like Ivy Bridge or Haswell. And no mapping on older Core 2. Probably, this name/concept/idea is not good for changed and complex microarchitectures of modern Out-of-order CPUs without simple scalar measurement of global "Stall".

代码位于arch/x86/events/intel/core.c 中,综合事件名称为PERF_COUNT_HW_STALLED_CYCLES_FRONTENDPERF_COUNT_HW_STALLED_CYCLES_BACKEND:

__init int intel_pmu_init(void)
{...

自Nehalem以来,这两个定义都是针对Westmere,Sandy Bridge:

Both are defined since Nehalem, for Westmere, Sandy Bridge:

    case INTEL_FAM6_NEHALEM:
    case INTEL_FAM6_NEHALEM_EP:
    case INTEL_FAM6_NEHALEM_EX:

        /* UOPS_ISSUED.STALLED_CYCLES */
        intel_perfmon_event_map[PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] =
            X86_CONFIG(.event=0x0e, .umask=0x01, .inv=1, .cmask=1);
        /* UOPS_EXECUTED.CORE_ACTIVE_CYCLES,c=1,i=1 */
        intel_perfmon_event_map[PERF_COUNT_HW_STALLED_CYCLES_BACKEND] =
            X86_CONFIG(.event=0xb1, .umask=0x3f, .inv=1, .cmask=1);

    case INTEL_FAM6_WESTMERE:
    case INTEL_FAM6_WESTMERE_EP:
    case INTEL_FAM6_WESTMERE_EX:

        /* UOPS_ISSUED.STALLED_CYCLES */
        intel_perfmon_event_map[PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] =
            X86_CONFIG(.event=0x0e, .umask=0x01, .inv=1, .cmask=1);
        /* UOPS_EXECUTED.CORE_ACTIVE_CYCLES,c=1,i=1 */
        intel_perfmon_event_map[PERF_COUNT_HW_STALLED_CYCLES_BACKEND] =
            X86_CONFIG(.event=0xb1, .umask=0x3f, .inv=1, .cmask=1);


    case INTEL_FAM6_SANDYBRIDGE:
    case INTEL_FAM6_SANDYBRIDGE_X:


        /* UOPS_ISSUED.ANY,c=1,i=1 to count stall cycles */
        intel_perfmon_event_map[PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] =
            X86_CONFIG(.event=0x0e, .umask=0x01, .inv=1, .cmask=1);
        /* UOPS_DISPATCHED.THREAD,c=1,i=1 to count stall cycles*/
        intel_perfmon_event_map[PERF_COUNT_HW_STALLED_CYCLES_BACKEND] =
            X86_CONFIG(.event=0xb1, .umask=0x01, .inv=1, .cmask=1);

仅为常春藤桥定义了前端档位

Only frontend stall is defined for Ivy Bridge

    case INTEL_FAM6_IVYBRIDGE:
    case INTEL_FAM6_IVYBRIDGE_X:

        /* UOPS_ISSUED.ANY,c=1,i=1 to count stall cycles */
        intel_perfmon_event_map[PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] =
            X86_CONFIG(.event=0x0e, .umask=0x01, .inv=1, .cmask=1);

对于较新的CPU台式机(Haswell,Broadwell,Skylake,Kaby Lake)和Phi(KNL,KNM),没有前端和后端停顿的映射:

No mapping for frontend and for backend stalls for more recent CPUs desktop (Haswell, Broadwell, Skylake, Kaby Lake) and Phi (KNL, KNM):

    case INTEL_FAM6_HASWELL_CORE:
    case INTEL_FAM6_HASWELL_X:
    case INTEL_FAM6_HASWELL_ULT:
    case INTEL_FAM6_HASWELL_GT3E:

    case INTEL_FAM6_BROADWELL_CORE:
    case INTEL_FAM6_BROADWELL_XEON_D:
    case INTEL_FAM6_BROADWELL_GT3E:
    case INTEL_FAM6_BROADWELL_X:


    case INTEL_FAM6_XEON_PHI_KNL:
    case INTEL_FAM6_XEON_PHI_KNM:


    case INTEL_FAM6_SKYLAKE_MOBILE:
    case INTEL_FAM6_SKYLAKE_DESKTOP:
    case INTEL_FAM6_SKYLAKE_X:
    case INTEL_FAM6_KABYLAKE_MOBILE:
    case INTEL_FAM6_KABYLAKE_DESKTOP:

也没有为旧版Core2定义(未检查Atom):

Not defined for old Core2 too (did not check Atoms):

http ://elixir.free-electrons.com/linux/v4.11/source/arch/x86/events/intel/core.c#L27

static u64 intel_perfmon_event_map[PERF_COUNT_HW_MAX] __read_mostly =
{
    [PERF_COUNT_HW_CPU_CYCLES]      = 0x003c,
    [PERF_COUNT_HW_INSTRUCTIONS]        = 0x00c0,
    [PERF_COUNT_HW_CACHE_REFERENCES]    = 0x4f2e,
    [PERF_COUNT_HW_CACHE_MISSES]        = 0x412e,
    [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = 0x00c4,
    [PERF_COUNT_HW_BRANCH_MISSES]       = 0x00c5,
    [PERF_COUNT_HW_BUS_CYCLES]      = 0x013c,
    [PERF_COUNT_HW_REF_CPU_CYCLES]      = 0x0300, /* pseudo-encoding */
};

这篇关于Haswell微架构在perf中没有Stalled-cycles-backend的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆