如何在 Arm 架构硬件上进行内存测试?(类似于 Memtest86) [英] How to do memory test on Arm Architecture Hardware? (something like Memtest86)

查看:68
本文介绍了如何在 Arm 架构硬件上进行内存测试?(类似于 Memtest86)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有办法在android设备RAM上做完整的内存测试?

我正在开发一个驱动程序,但在随机时间我得到某些具有错误值的物理地址,导致驱动程序进入错误状态.当我遇到问题时,我正在尝试从 RAM 中读取数据.我认为我设备上内存的某些部分已损坏.

解决方案

Complete 是一个模棱两可的词.这可能意味着不同的温度、电压以及具有不同组件容差的一系列设备.当您访问 MemTest86 时,我想我明白了.我见过的大多数项目都是基于 C 的,无法测试所有内容.

这是在 Linux 下运行的一个 - http://www.madsgroup.org/~quintela/memtest/

有记录的算法,例如 walking bits 等.很大程度上取决于您的 RAM 类型.我猜你有某种类型的 SDRAM.SDRAM 有许多不同的周期.有单节拍读/写、银行间转账、终止突发等.

就个人而言,我们有一个系统,当通过以太网 (DMA) 进行 SSH 传输时,有 5% 的电路板会出现问题.SSH 涉及 CPU/内存密集型加密,并且 DMA 引擎通常执行与 CPU(带缓存)不同的 SDRAM 周期.

这里有一些要求,

  1. 用于驻留代码的非 SDRAM 内存.
  2. 裸机框架(无缓存、中断、DMA 等)
  3. 关闭 DCache.
  4. 为代码打开 ICache.

另一个限制要求是运行的时间.完整 SDRAM 测试可能需要数年时间才能在单板上运行.我发现伪随机地址/数据测试效果很好.只需取与 SDRAM 大小相对质数的数字,并将其用作增量.最简单的情况是1.您可能希望找到其他不断更改 rowsbanks 和设备大小的人;bank size-1 例如;但是质数会更好地工作,因为您有不同数量的位一直在变化.关闭缓存后,您可以使用 charshortintlong long 指针来测试一些不同的突发长度.这些测试会很慢.您将需要使用 ldm/stm 对来模拟完整的 SDRAM 突发,这些在 缓存开启 中更为常见,因此您应该模拟它们使用 ldm/stm.这也是最快的测试之一.

<前>typedef unsigned char b8;typedef unsigned short b16;typedef unsigned long b32;typedef unsigned long long b64;/* 使用宏来加速代码.编译器将使用常量* _incr 和 _wrap 而不是导致溢出的寄存器.一种* 宏集中了内存测试逻辑.*/#define MEMTEST(name,type,_incr,_wrap) .../* 顺序测试.*/MEMTEST(do_mem_seq8, b8, 97, 1)MEMTEST(do_mem_seq16, b16, 50839, 1)MEMTEST(do_mem_seq32, b32, 3999971, 1)MEMTEST(do_mem_seq64, b64, 3999971, 1)/* 随机测试.这些测试试图随机化数据和* 地址访问.*//* 97/0x61 主要用于 char 和 9999991/0x989677 主要用于 64MB.*/MEMTEST(do_mem_rnd8,b8,97,9999991)/* 64k 的 50839/C697 大素数和 64MB 的 9999991/0x989677 素数.*/MEMTEST(do_mem_rnd16,b16,50839,9999991)/* 64MB 的 3999971/3D08E3 素数和 9999991/0x989677 素数.*/MEMTEST(do_mem_rnd32,b32,3999971,9999991)/* 64MB 的 3999971/3D08E3 素数和 9999991/0x989677 素数.*/MEMTEST(do_mem_rnd64,b64,3999971,9999991)

incr 是数据增量,wrap 是地址增量.burst 的算法将是相同的.这是一些内联 gcc 汇编程序

 register ulong t1 asm("r0") = 0;\注册 ulong t2 asm ("r4") = t1 + incr;\注册 ulong t3 asm ("r6") = t2 + incr;\注册 ulong t4 asm ("r8") = t3 + incr;\/* 运行整个突发线.*/\__asm__ (" stmia %[ptr], {%0,%1,%2,%3}\r\n" : : \"r" (t1), "r" (t2), "r" (t3), "r" (t4), \[ptr]"r" (start + (addr<<2)) : \记忆" );\/* 读取四个 32 位值.*/\__asm__ (" ldmia %[ptr], {%0, %1, %2, %3}\r\n": \"=r" (t1), "=r" (t2), "=r" (t3), "=r" (t4) : \[ptr]"r" (start + (addr<<2)));\

这些测试很简单,应该适合代码缓存,这将最大限度地增加 RAM 的压力.我们的主要问题是 DQS 延迟,这对 DDR-SDRAM 至关重要,并且可能取决于温度和电压,并且会随 PCB 布局和材料而变化.

如果您正在优化内存控制器,可以使用

Cachbench向 SDRAM 芯片注册.它也可能对测试有用.

另见:Unix Stack Exchange(同样的问题).我在 Linux 下使用了这些基于 C 的测试套件,但在我们的案例中它们没有暴露任何问题.memtest86 算法可能不像我上面描述的那样有压力(对于 PCB 故障);尽管测试 7burnBX 测试很接近.我认为 memtest86 可以找到 DRAM 芯片问题,而不是电路板设计问题.

另一个问题是与 SDRAM 芯片的瞬态/串扰.如果您的设备驱动程序是高电流或高频设备,SDRAM 接口可能会拾取串扰,或由于电源变化而获得双时钟.因此,RAM 测试可能不会显示任何问题,并且 SDRAM 错误仅在使用硬件的特定部分时发生.还要注意 Android 设备不使用动态时钟和更改 SDRAM 频率.随着时钟的变化,信号可能会交叉共振.

Is there a way to do complete memory test on android device RAM?

I'm developing a driver but at ramdom times I get certain physical addresses with wrong value causing the driver to go into wrong state. I'm trying to read from RAM when I hit the problem. I Think certain portions of ram on my device are corrupted.

解决方案

Complete is an ambiguous word. It may mean different temperatures, voltages and across a range of devices with different component tolerances. As you site MemTest86, I think I understand. Most project I have seen are C based and can not test everything.

Here is one running under Linux - http://www.madsgroup.org/~quintela/memtest/

There are algorithms documented such as walking bits, etc. A lot depends on your RAM type. I guess you have some type of SDRAM. There are many different cycles with SDRAM. There are single beat reads/write, bank-to-bank transfer, terminated bursts, etc.

Personally, we had a system were 5% of the boards would show problems when doing an SSH transfer over Ethernet (DMA). The SSH involves encryption which is CPU/memory intensive and the DMA engine often does different SDRAM cycles than the CPU (with cache).

Here are some requirements,

  1. Non-SDRAM memory for code to reside.
  2. Bare metal framework (no cache, interrupts, DMA, etc)
  3. Turn off the DCache.
  4. Turn on the ICache for the code.

Another limiting requirement is the time to run. A complete SDRAM test could take years to run on a single board. I have found that a pseudo random address/data test works well. Just take numbers that are relative prime to the size of the SDRAM and use that as an increment. The simplest case is 1. You might wish to find the others to constantly change rows, banks and device size; bank size-1 for example; however prime numbers will work better as you have different amounts of bits changing all the time. With the cache off, you can use char, short, int, and long long pointers to test some different burst lengths. These tests will be slow. You will need to use ldm/stm pairs to simulate a full SDRAM burst, these are more common with the cache on so you should simulate them with ldm/stm. This is also one of the fastest tests.

typedef unsigned char      b8;
typedef unsigned short     b16;
typedef unsigned long      b32;
typedef unsigned long long b64;

/* Use a macro to speed code.  The compiler will use constants for
 * _incr and _wrap instead of registers which cause spilling.  A
 * macro centralizes the memory test logic.
 */
#define MEMTEST(name,type,_incr,_wrap) ...

/* Sequential tests. */
MEMTEST(do_mem_seq8,   b8, 97, 1)
MEMTEST(do_mem_seq16, b16, 50839, 1)
MEMTEST(do_mem_seq32, b32, 3999971, 1)
MEMTEST(do_mem_seq64, b64, 3999971, 1)

/* Random tests. These test try to randomize both the data and the
 * address access.
 */

/* 97/0x61 prime for char and 9999991/0x989677 prime for 64MB. */
MEMTEST(do_mem_rnd8,b8,97,9999991)
/* 50839/C697 large prime for 64k and 9999991/0x989677 prime for 64MB. */
MEMTEST(do_mem_rnd16,b16,50839,9999991)
/* 3999971/3D08E3 prime and 9999991/0x989677 prime for 64MB. */
MEMTEST(do_mem_rnd32,b32,3999971,9999991)
/* 3999971/3D08E3 prime and 9999991/0x989677 prime for 64MB. */
MEMTEST(do_mem_rnd64,b64,3999971,9999991)

incr is the data increment and wrap is the address increment. The algorithm for the burst will be the same. Here is some inline gcc assembler,

    register ulong t1 asm ("r0")  = 0;                              \
    register ulong t2 asm ("r4")  = t1 + incr;                      \
    register ulong t3 asm ("r6")  = t2 + incr;                      \
    register ulong t4 asm ("r8")  = t3 + incr;                      \
        /* Run an entire burst line. */                             \
        __asm__ (" stmia  %[ptr], {%0,%1,%2,%3}\r\n" : :            \
                 "r" (t1), "r" (t2), "r" (t3), "r" (t4),            \
                 [ptr]"r" (start + (addr<<2)) :                     \
                 "memory" );                                        \
        /* Read four 32 bits values. */                             \
        __asm__ (" ldmia   %[ptr], {%0, %1, %2, %3}\r\n" :          \
                 "=r" (t1), "=r" (t2), "=r" (t3), "=r" (t4) :       \
                 [ptr]"r" (start + (addr<<2)) );                    \

These tests are simple and should fit in the code cache which will maximize stress on the RAM. Our main issue was the DQS delay which is critical for DDR-SDRAM and can be temperature and voltage dependent and will vary with PCB layout and materials.

Cachbench can be used if you are optimizing the memory controller registers with the SDRAM chips. It may also be useful for testing.

See also: Unix Stack Exchange (same question). I used these C based test suites under Linux, but they didn't expose any issues in our case. The memtest86 algorithms may not be as stressful (for PCB glitches) as what I describe above; although test 7 or the burnBX test is close. I think memtest86 caters to find DRAM chip issues as opposed to board design issues.

Edit: Another issue is transients/cross talk with the SDRAM chips. If your device driver is a high current or high frequency device, the SDRAM interface can possible pick up cross talk, or get a double clock due to supply variations. So a RAM test may show no issues and the SDRAM error only happens when a particular portion of hardware is used. Also be careful that the Android device doesn't use dynamic clocking and change the SDRAM frequency. Signals may cross a resonance as the clock changes.

这篇关于如何在 Arm 架构硬件上进行内存测试?(类似于 Memtest86)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆