看到xscal臂QUOT偶然故障;故障未处理:在0x40019004&QUOT IM precise外部中止(0x416); [英] Occasional fault seen on xscal arm "Unhandled fault: imprecise external abort (0x416) at 0x40019004"

查看:501
本文介绍了看到xscal臂QUOT偶然故障;故障未处理:在0x40019004&QUOT IM precise外部中止(0x416);的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们对费尔德,具有运行Linux 2.6.24 XSCALE ARM内核ixp23xx元网络处理器上运行prodcution软件。我们从费尔德偶尔看到的问题,有时在实验室重现,在控制台打印如下断层线
未处理的故障:IM precise外部中止(0x416)在0x40019004。
进一步深挖中,我们发现,我们有几页表项,其中虚拟地址不映射到有效的物理地址。从而访问这些虚拟地址可引起impricise中止。最终的解决办法是删除了错误的映射,因此下一次我们应该得到段错误这将precise和容易赶上。但是,删除错误输入需要一定的时间,我们必须与调试信息因此这个选项是以后的时间创建构建。

We have prodcution software running on the feild , which has ixp23xx netowrk processor with XSCALE arm core running linux 2.6.24. We have seen occasional problem from the feild and sometimes reproduced in the lab, the console prints the below fault line "Unhandled fault: imprecise external abort (0x416) at 0x40019004". Further digging, we found that we have few page table entries, where virtual addresses are not mapped to valid physical address. thus access to those virtual addresses can cause the impricise aborts. The final solution would be to remove the wrong mapping and thus next time we should get segmentation fault which would precise and easy to catch. But removing wrong entries will take some time and we have to create the build with debugging information thus this option is for later time.

现在回到问题,因为按照XSCALE数据表,此故障几乎可以由precise(+3 INSTR)与通过设置Xbit = 0,C位= 0和B位摆摊,直到完成 = 0。但我不知道究竟是如何做到这一点在Linux中,它会帮助吗?基本上这看起来像禁用DCACHE。根据ARC / ARM /毫米code的/ proc-xscale.S是所有装配和我不知道究竟如何禁用。有一个在内核配置,即CONFIG_CPU_DCACHE_DISABLE一个选项,这似乎禁用DCACHE,但会是一样等于0 X = C = B位?下面是从数据表中的节选

Coming back to question, As per the XSCALE data sheet, this fault can be made almost precise(+3 instr) with "stall untill complete" by setting the Xbit = 0, C bit= 0 and B bit=0. but i am not sure how exactly to do it in linux and is it going to help ? basically this looks like disable the DCACHE. The code under arc/arm/mm/proc-xscale.S is all assembly and i am not sure how exactly to disable. There is one option in the Kernel Config i.e. CONFIG_CPU_DCACHE_DISABLE , this seems to disable the DCACHE but will it be same as X=C=B bits equal to 0? below are the excerpt from data sheet

*

林precise数据中止可能会创建方案难以中止
  处理程序来恢复。无论外部数据中止和数据缓存奇偶校验
  可能导致损坏的目标寄存器中的数据错误。由于这些
  故障是IM precise,有可能损坏的数据将一直
  在调用数据中止故障处理程序之前。因为这个,
  软件应该把IM precise数据中止为unrecoverable.Even
  内存访问标记为摆摊,直到完成(见3.2.2.4)
  可能导致IM precise数据中止。对于这些类型的访问,所述的
  故障稍差IM precise比一般情况下:它是
  保证被指令的三个指令内提出
  导致它。换句话说,如果一个失速直至完全LD或ST
  指令触发了IM precise故障,则该故障将被视为
  由三条指令中的程序。如果MMU被禁用所有
  数据访问将是不可缓存和无缓冲的。这是在
  相同的行为,当MMU被启用,以及数据存取使用作为
  描述符与X,C和B中的所有组为0的X,C和B位
  确定当处理器应该将新的数据到数据
  缓存。缓存的地方数据到缓存中的行(也称为
  块)。因此,为使有关放置新数据的决定的依据
  到高速缓存是一种所谓的生产线分配政策。如果线
  分配策略是读分配,即错过了所有负载操作
  缓存

Imprecise data aborts may create scenarios difficult for an abort handler to recover. Both external data aborts and data cache parity errors may result in corrupted targeted register data. Because these faults are imprecise, it is possible corrupted data will have been used before the Data Abort fault handler is invoked. Because of this, software should treat imprecise data aborts as unrecoverable.Even memory accesses marked as "stall until complete" (see Section 3.2.2.4) can result in imprecise data aborts. For these types of accesses, the fault is somewhat less imprecise than the general case: it is guaranteed to be raised within three instructions of the instruction that caused it. In other words, if a "stall until complete" LD or ST instruction triggers an imprecise fault, then that fault will be seen by the program within three instructions. If the MMU is disabled all data accesses will be non-cacheable and non-bufferable. This is the same behavior as when the MMU is enabled, and a data access uses a descriptor with X, C, and B all set to 0. The X, C, and B bits determine when the processor should place new data into the Data Cache. The cache places data into the cache in lines (also called blocks). Thus, the basis for making a decision about placing new data into the cache is a called a "Line Allocation Policy". If the Line Allocation Policy is read-allocate, all load operations that miss the cache

*

推荐答案

的StrongARM 的XScale 是由英特尔定制的CPU。他们似乎有一些奇怪的问题,与其他ARM处理器。

The StrongARM and XScale are custom CPUs by Intel. They seem to have some odd issues versus other ARM processors.

$ git checkout v2.6.24.7  # Activate time machine.
$ grep -B1 -A 9 CPU_XSC3 Kconfig 
# XScale Core Version 3
config CPU_XSC3
        bool
        depends on ARCH_IXP23XX || ARCH_IOP13XX || PXA3xx
        default y
        select CPU_32v5
        select CPU_ABRT_EV5T
        select CPU_CACHE_VIVT
        select CPU_CP15_MMU
        select CPU_TLB_V4WBI if MMU
        select IO_36

相关的的Kconfig 的就是 CPU_ABRT_EV5T CPU_TLB_V4WBI ,这个选择的放弃-ev5t.S 的和的 TLB-v4wbi.S 的它可以获取你所感兴趣的东西。

The relevant Kconfig is CPU_ABRT_EV5T and CPU_TLB_V4WBI, this selects abort-ev5t.S and tlb-v4wbi.S which gets the stuff that you are interested in.

 * Purpose : obtain information about current aborted instruction.
 * Note: we read user space.  This means we might cause a data
 * abort here if the I-TLB and D-TLB aren't seeing the same
 * picture.  Unfortunately, this does happen.  We live with it.

我相信大多数的CPU没有单独的 I-TLB 的和的 D-TLB 的。在code为试图仿效的故障状态的读取和解码该故障的说明。该的 I-TLB 的(指令MMU页面缓存)和 D-TLB 的(数据MMU页面缓存)可能不同意和指令存储器的读可能会做一些奇怪的。

I believe most CPUs don't have separate I-TLB and D-TLB. The code is trying to emulate a fault status by reading and decoding the instructions that faulted. The I-TLB (instruction MMU page cache) and the D-TLB (data MMU page cache) may not agree and the reading of the instruction memory may do something odd.

您个人的的生活?也就是说,你知道,如果在 ixp23xx XScale3(XSC3)具有独立的I / D的转译后备缓冲区的(TLB的)?

Are you the person living with it? Ie, do you know if the ixp23xx XScale3 (XSC3) has separate I/D translation look aside buffer (TLBs)?

其他的古怪的是在 IO_36 。 CPU具有的 36位的地址。见<一href=\"https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/arch/arm/include/asm/domain.h\"相对=nofollow>为源domain.h 。看来,一个变成的地址的一部分。这可能导致一些奇怪的效果,但我无法找到一个粗略地看一下东西。

The other oddity is the IO_36. The CPU has 36-bit addresses. See domain.h for the source. It appears that a domain becomes part of an address. This maybe causing some weird effect but I couldn't find anything with a cursory look.

对不起,我还没有回答你的问题。这将是一个漫长的评论。

Sorry, I haven't answered your question. This would be a long comment.

现在回到问题,因为按照XSCALE数据表中,该故障可以用摆摊,直到完成进行几乎precise(+3 INSTR)通过设置Xbit = 0,C位= 0和B位= 0。 ......有一个在内核配置,即CONFIG_CPU_DCACHE_DISABLE一个选项

Coming back to question, As per the XSCALE data sheet, this fault can be made almost precise(+3 instr) with "stall until complete" by setting the Xbit = 0, C bit= 0 and B bit=0. ... There is one option in the Kernel Config i.e. CONFIG_CPU_DCACHE_DISABLE

CONFIG_CPU_DCACHE_DISABLE 不会解决您的问题。在 I-缓存写缓冲的仍将有效。同时,你的系统将是极其缓慢。内核命令行选项的CachePolicy 可以用来代替。它支持未缓存缓存写通式回写 writealloc 。一些值可能不适用到该平台。我觉得的CachePolicy =未缓存可能等同于 CONFIG_CPU_DCACHE_DISABLE

CONFIG_CPU_DCACHE_DISABLE will not fix your issue. The I-cache and write buffering will still be active. As well, your system will be extremely slow. The kernel command line option cachepolicy can be used instead. It supports, uncached, buffered, writethrough, writeback, and writealloc. Some values might not be applicable to the platform. I think cachepolicy=uncached might be equivalent to compiling with CONFIG_CPU_DCACHE_DISABLE.

这篇关于看到xscal臂QUOT偶然故障;故障未处理:在0x40019004&QUOT IM precise外部中止(0x416);的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆