在ARMv8-A Linux上禁用CPU缓存(L1/L2) [英] Disable CPU caches (L1/L2) on ARMv8-A Linux

查看:341
本文介绍了在ARMv8-A Linux上禁用CPU缓存(L1/L2)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在运行Linux的ARMv8-A平台上禁用低级缓存,以便评估优化代码的性能,而与缓存访问无关.

I want to disable the low level cache on an ARMv8-A platform running Linux, in order to measure performance of optimized code, independent of cache access.

对于Intel系统,我发现以下资源(

For Intel systems I found the following resource (Is there a way to disable CPU cache (L1/L2) on a Linux system?), but I can not directly be applied directly due to a different instruction set.

到目前为止,我有一个内核模块,该模块可以更改相应的系统寄存器以禁用指令和数据缓存.

So far I have a kernel module which alters the corresponding system register to disable instruction and data cache.

#include <linux/module.h>

int init_module(void)
{
  int64_t value;

  asm volatile("\
    MRS %0, SCTLR_EL1     // Read SCTLR_EL1 into Xt\n\
    BIC %0, %0, (1<<2)    // clear bit 2, SCTLR_EL1.C\n\
    BIC %0, %0, (1<<12)   // clear bit 12, SCTLR_EL1.I\n\
    MSR SCTLR_EL1, %0     // Write Xt to SCTLR_EL1\n\
  " : "+r" (value));

  return 0;
}

void cleanup_module(void)
{
  int64_t value;

  asm volatile("\
    MRS %0, SCTLR_EL1    // Read SCTLR_EL1 into Xt\n\
    ORR %0, %0, (1<<2)   // set bit 2, SCTLR_EL1.C\n\
    ORR %0, %0, (1<<12)  // set bit 12, SCTLR_EL1.I\n\
    MSR SCTLR_EL1, %0    // Write Xt to SCTLR_EL1\n\
  ": "+r" (value));
}

MODULE_LICENSE("GPL");

但是,当加载时,它会导致完整的系统冻结(当我在系统寄存器中设置相应的位时).我的猜测是,我仍然需要清除某种缓存,但是在ARM手册中没有发现任何有用的东西.

However it results in a complete system freeze when loaded (when I set the corresponding bits in the system register). My guess is that I still need some kind of cache clear, but I didn't find anything useful in the ARM manuals.

任何人都有一些有用的提示,我如何才能成功禁用ARM上的缓存或此处缺少的内容?谢谢.

Anyone has some helpful hints how I could succeed in disabling the cache on ARM or what I am missing here? Thanks.

推荐答案

通常,由于多种原因,这是行不通的.

In general, this is unworkable, for several reasons.

首先,清除SCTLR.C位只会使所有数据访问不可缓存.并防止分配到任何缓存中.缓存中的任何数据仍然保留在缓存中,尤其是最近写入的内容中的脏行;考虑一下函数返回时,调用者试图恢复一个堆栈帧的情况,该堆栈帧甚至不存在于它正在访问的内存中.

Firstly, clearing the SCTLR.C bit only makes all data accesses non-cacheable. and prevents allocating into any caches. Any data in the caches is still there in the caches, especially dirty lines from anything recently-written; consider what happens when your function returns and the caller tries to restore a stack frame which doesn't even exist in the memory it's now accessing.

其次,单处理器ARMv8系统很少.假设您正在运行SMP Linux,并且突然在恰好安排了模块加载程序的任何CPU上都禁用了高速缓存,那么即使不考虑第一个要点,事情也会非常迅速地走下坡路. Linux期望所有CPU相互协调,并且如果违反该假设,通常会非常迅速地损坏.注意,为此甚至不值得冒险进行SMP交叉调用.足以说甚至尝试在禁用缓存的情况下运行Linux的 safe 唯一方法是确保除了以下条件之外,它们从未启用过:

Secondly, there are very few uniprocessor ARMv8 systems; assuming you're running SMP Linux, and suddenly disable the caches on just whichever CPU the module loader happened to be scheduled on, then even disregarding the first point things are going to go downhill very fast. Linux expects all CPUs to be coherent with each other, and will typically become very broken very rapidly if that assumption is violated. Note that it's not even worth venturing into SMP cross-calling for this; suffice to say the only safe way to even attempt to run Linux with caches disabled is to make sure they are never enabled to begin with, except...

第三,不能保证Linux甚至可以在禁用缓存的情况下运行.在当前硬件上,内核(更不用说用户空间)中的所有锁定和原子操作都依赖于排他访问指令.尽管CPU集群将实现体系结构所需的本地和全局可缓存内存的独占监视器(通常作为缓存机制本身的一部分),但是否实施了针对不可缓存访问的全局独占监视器取决于系统,因此必须在CPU外部(通常在互连模块或内存控制器中).许多系统没有实现这样的全局监视器,在这种情况下,对外部内存的独占访问可能会出错,不执行任何操作或其他各种实现定义的行为,这将导致Linux崩溃或死锁.在这样的系统上关闭缓存实际上是不可能运行Linux的-仅为了使UP arm64内核正常工作而进行的大量黑客入侵(从根本上说SMP几乎是不可能的).祝您用户空间好运.

Thirdly, there is no guarantee Linux will even run with caches disabled. On current hardware, all of the locking and atomic operations in the kernel (not to mention userspace) rely on the exclusive access instructions. Whilst the CPU cluster(s) will implement the architecturally-required local and global exclusive monitors for cacheable memory (usually as part of the cache machinery itself), it is dependent on the system whether a global exclusive monitor for non-cacheable accesses is implemented, as such a thing must be external to the CPU (usually in the interconnect or memory controller). Many systems don't implement such a global monitor, in which case exclusive accesses to external memory may fault, do nothing, or other various implementation-defined behaviours which will result in Linux crashing or deadlocking. It is effectively impossible to run Linux with the cache off on such a system - the amount of hacking just to get a UP arm64 kernel to work (SMP would be literally impossible) would be impractical alone; good luck with userspace.

尽管发生了最严重的问题,不是这个,而是这样:

As it happens, though, the worst problem is none of that, it's this:

...为了衡量优化代码的性能,与缓存访问无关.

...in order to measure performance of optimized code, independent of cache access.

如果代码旨在在禁用缓存的情况下在部署中运行,那么从逻辑上讲,它不应在Linux下运行,因此,花在黑客攻击Linux上的精力最好花在更现实的执行环境中进行基准测试,因此结果实际上是有代表性的.另一方面,如果 旨在在启用了缓存的情况下运行(在Linux或任何其他操作系统下),则在禁用缓存的情况下进行基准测试将产生毫无意义的结果,并且会浪费时间.例如,优化"实际中不存在的指令获取带宽瓶颈不会引导您走向正确的方向.

If the code is intended to run in deployment with caches disabled, then logically it can't be intended to run under Linux, therefore the effort spent in hacking up Linux would be better spent on benchmarking in a more realistic execution environment so that results are actually representative. On the other hand, if it is intended to run with caches enabled (under Linux or any other OS), then benchmarking with caches disabled will give meaningless results and be a waste of time. "Optimising" for e.g. an instruction-fetch-bandwidth bottleneck which won't exist in practice is not going to lead you in the right direction.

这篇关于在ARMv8-A Linux上禁用CPU缓存(L1/L2)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆