在 ARMv8-A Linux 上禁用 CPU 缓存 (L1/L2) [英] Disable CPU caches (L1/L2) on ARMv8-A Linux

查看:33
本文介绍了在 ARMv8-A Linux 上禁用 CPU 缓存 (L1/L2)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在运行 Linux 的 ARMv8-A 平台上禁用低级缓存,以便独立于缓存访问来衡量优化代码的性能.

I want to disable the low level cache on an ARMv8-A platform running Linux, in order to measure performance of optimized code, independent of cache access.

对于英特尔系统,我找到了以下资源(Linux 系统上有没有办法禁用 CPU 缓存(L1/L2)?),但由于指令集不同,无法直接应用.

For Intel systems I found the following resource (Is there a way to disable CPU cache (L1/L2) on a Linux system?), but I can not directly be applied directly due to a different instruction set.

到目前为止,我有一个内核模块,它改变相应的系统寄存器以禁用指令和数据缓存.

So far I have a kernel module which alters the corresponding system register to disable instruction and data cache.

#include <linux/module.h>

int init_module(void)
{
  int64_t value;

  asm volatile("\
    MRS %0, SCTLR_EL1     // Read SCTLR_EL1 into Xt\n\
    BIC %0, %0, (1<<2)    // clear bit 2, SCTLR_EL1.C\n\
    BIC %0, %0, (1<<12)   // clear bit 12, SCTLR_EL1.I\n\
    MSR SCTLR_EL1, %0     // Write Xt to SCTLR_EL1\n\
  " : "+r" (value));

  return 0;
}

void cleanup_module(void)
{
  int64_t value;

  asm volatile("\
    MRS %0, SCTLR_EL1    // Read SCTLR_EL1 into Xt\n\
    ORR %0, %0, (1<<2)   // set bit 2, SCTLR_EL1.C\n\
    ORR %0, %0, (1<<12)  // set bit 12, SCTLR_EL1.I\n\
    MSR SCTLR_EL1, %0    // Write Xt to SCTLR_EL1\n\
  ": "+r" (value));
}

MODULE_LICENSE("GPL");

但是,加载时会导致系统完全冻结(当我在系统寄存器中设置相应位时).我的猜测是我仍然需要某种缓存清除,但我在 ARM 手册中没有找到任何有用的东西.

However it results in a complete system freeze when loaded (when I set the corresponding bits in the system register). My guess is that I still need some kind of cache clear, but I didn't find anything useful in the ARM manuals.

任何人都有一些有用的提示,我可以如何成功禁用 ARM 上的缓存或我在这里缺少什么?谢谢.

Anyone has some helpful hints how I could succeed in disabling the cache on ARM or what I am missing here? Thanks.

推荐答案

总的来说,这是行不通的,原因有几个.

In general, this is unworkable, for several reasons.

首先,清除 SCTLR.C 位只会使所有数据访问不可缓存.并防止分配到任何缓存中.缓存中的任何数据仍然存在于缓存中,尤其是最近写入的任何内容的脏行;考虑一下当您的函数返回并且调用者尝试恢复一个堆栈帧时会发生什么,该堆栈帧甚至在它现在访问的内存中都不存在.

Firstly, clearing the SCTLR.C bit only makes all data accesses non-cacheable. and prevents allocating into any caches. Any data in the caches is still there in the caches, especially dirty lines from anything recently-written; consider what happens when your function returns and the caller tries to restore a stack frame which doesn't even exist in the memory it's now accessing.

其次,单处理器ARMv8系统很少;假设您正在运行 SMP Linux,并且突然禁用模块加载器恰好被安排在哪个 CPU 上的缓存,那么即使忽略第一点,事情也会很快走下坡路.Linux 期望所有 CPU 彼此一致,如果违反该假设,通常会非常迅速地崩溃.请注意,为此甚至不值得冒险使用 SMP 交叉调用;可以说,即使尝试在禁用缓存的情况下运行 Linux 的唯一安全方法就是确保它们从一开始就从未被启用,除非......

Secondly, there are very few uniprocessor ARMv8 systems; assuming you're running SMP Linux, and suddenly disable the caches on just whichever CPU the module loader happened to be scheduled on, then even disregarding the first point things are going to go downhill very fast. Linux expects all CPUs to be coherent with each other, and will typically become very broken very rapidly if that assumption is violated. Note that it's not even worth venturing into SMP cross-calling for this; suffice to say the only safe way to even attempt to run Linux with caches disabled is to make sure they are never enabled to begin with, except...

第三,无法保证 Linux 甚至可以在禁用缓存的情况下运行.在当前的硬件上,内核(更不用说用户空间)中的所有锁定和原子操作都依赖于独占访问指令.虽然 CPU 集群将为可缓存内存实现架构所需的本地和全局独占监视器(通常作为缓存机制本身的一部分),但是否实现了用于不可缓存访问的全局独占监视器取决于系统,因为这样的东西必须在 CPU 外部(通常在互连或内存控制器中).许多系统没有实现这样的全局监视器,在这种情况下,对外部存储器的独占访问可能会出错、不做任何事情或其他各种实现定义的行为,这将导致 Linux 崩溃或死锁.在这样的系统上在关闭缓存的情况下运行 Linux 实际上是不可能的——仅仅为了让 UP arm64 内核工作(SMP 实际上是不可能的)而进行的大量黑客攻击是不切实际的;祝用户空间好运.

Thirdly, there is no guarantee Linux will even run with caches disabled. On current hardware, all of the locking and atomic operations in the kernel (not to mention userspace) rely on the exclusive access instructions. Whilst the CPU cluster(s) will implement the architecturally-required local and global exclusive monitors for cacheable memory (usually as part of the cache machinery itself), it is dependent on the system whether a global exclusive monitor for non-cacheable accesses is implemented, as such a thing must be external to the CPU (usually in the interconnect or memory controller). Many systems don't implement such a global monitor, in which case exclusive accesses to external memory may fault, do nothing, or other various implementation-defined behaviours which will result in Linux crashing or deadlocking. It is effectively impossible to run Linux with the cache off on such a system - the amount of hacking just to get a UP arm64 kernel to work (SMP would be literally impossible) would be impractical alone; good luck with userspace.

然而,事实上,最糟糕的问题不是这些,而是​​这个:

As it happens, though, the worst problem is none of that, it's this:

...为了衡量优化代码的性能,独立于缓存访问.

...in order to measure performance of optimized code, independent of cache access.

如果代码打算在禁用缓存的情况下在部署中运行,那么从逻辑上讲,它不能在 Linux 下运行,因此花在破解 Linux 上的努力最好花在更现实的执行环境中的基准测试上,所以结果实际上具有代表性.另一方面,如果它打算在启用缓存的情况下运行(在 Linux 或任何其他操作系统下),那么在禁用缓存的情况下进行基准测试将给出毫无意义的结果并且是浪费时间.优化"例如在实践中不存在的取指令带宽瓶颈不会引导您走向正确的方向.

If the code is intended to run in deployment with caches disabled, then logically it can't be intended to run under Linux, therefore the effort spent in hacking up Linux would be better spent on benchmarking in a more realistic execution environment so that results are actually representative. On the other hand, if it is intended to run with caches enabled (under Linux or any other OS), then benchmarking with caches disabled will give meaningless results and be a waste of time. "Optimising" for e.g. an instruction-fetch-bandwidth bottleneck which won't exist in practice is not going to lead you in the right direction.

这篇关于在 ARMv8-A Linux 上禁用 CPU 缓存 (L1/L2)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆