在 ARM 处理器上执行存储在外部 SPI 闪存中的程序 [英] Executing programs stored in external SPI flash memory on an ARM processor

查看：19 发布时间：2021/11/17 22:06:51 c arm execution spi flash-memory

本文介绍了在 ARM 处理器上执行存储在外部 SPI 闪存中的程序的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个能够与外部闪存芯片接口的 ARM 处理器.写入芯片的是为 ARM 架构编译的准备执行的程序.我需要知道如何做的是将这些数据从外部闪存获取到 ARM 处理器上执行.

我可以提前运行某种复制例程，将数据复制到可执行内存空间吗?我想我可以，但 ARM 处理器正在运行一个操作系统，我没有大量的闪存空间可以使用.我还希望能够一次安排两个甚至三个程序的执行，一次将多个程序复制到内部闪存中是不可行的.一旦程序在可访问的内存空间内，操作系统就可以用来启动它们，所以任何需要事先做的事情都可以.

解决方案

通过阅读@FiddlingBits 和@ensc 的现有答案，我认为我可以提供不同的方法.

您说您的 Flash 芯片无法进行内存映射.这是一个相当大的限制，但我们可以解决它.

是的，您可以提前运行复制例程.只要你把它放到RAM中你就可以执行它.

DMA 使其更快:

如果您有外设 DMA 控制器(例如 Atmel SAM3N 系列上可用的控制器)，那么您可以使用 DMA 控制器复制内存块，而您的处理器却在做真正有用的事情.

MMU 使其更简单:

如果您有可用的 MMU，那么您可以轻松完成此操作，只需选择您希望代码执行的 RAM 区域，将代码复制到其中，并在每次出现页面错误时，将正确的代码重新加载到相同的地区.但是，@ensc 已经提出了这个建议，所以我还没有添加任何新内容.

注意:如果不清楚，MMU 与 MPU 不同

没有 MMU 解决方案，但有 MPU 可用:

如果没有 MMU，任务会有点棘手，但仍然可以完成.您需要了解您的编译器如何生成代码并阅读有关位置无关代码 (PIC).然后，您需要在 RAM 中分配一个区域，您将从该区域执行外部闪存芯片代码并将其中的一部分复制到其中(确保从正确的位置开始执行它).需要将 MPU 配置为在任务尝试访问其分配区域之外的内存时生成错误，然后您需要获取正确的内存(这可能会成为一个复杂的过程)、重新加载并继续执行.

没有可用的 MMU 和 MPU:

如果您没有 MMU，此任务现在变得非常困难.在这两种情况下，您对外部代码的大小都有严格的限制.基本上，您存储在外部闪存芯片上的代码现在必须能够完全适合您将在其中执行的 RAM 中分配的区域.如果您可以将该代码拆分为不相互交互的单独任务，那么您就可以做到，否则就不能.

如果您正在生成 PIC，那么您可以编译任务并将它们按顺序放置在内存中.否则，您将需要使用链接描述文件来控制代码生成，以便将存储在外部闪存中的每个编译任务都将从 RAM 中的相同预定义位置执行(这将需要您了解 ld 覆盖或单独编译).

总结:

为了更完整地回答您的问题，我需要知道您使用的是什么芯片和操作系统.有多少可用的 RAM 也将帮助我更好地了解您的限制.

但是，您询问是否可以一次加载多个任务来运行.如果您像我建议的那样使用 PIC，应该可以这样做.如果没有，那么您需要提前决定每个任务的运行位置，这样才能同时加载/运行某些组合.

最后，这取决于您的系统和芯片，这可能很容易，也可能很难.

编辑 1:

提供的附加信息:

芯片是 SAM7S (Atmel)
它确实有一个外设 DMA 控制器.
它没有 MMU 或 MPU.
8K 的内部 RAM，这对我们来说是一个限制.
安装自定义编写的操作系统后，它大约有 28K 的闪存剩余.

提出的其他问题:

理想情况下，我想将程序复制到闪存空间并从那里执行它们.理论上这是可能的.程序一个指令一个指令执行是不是就不可能了?

是的，可以按指令执行程序指令(但这种方法也有一个限制，我将在稍后介绍).您将首先在您的单个指令所在的内存中分配一个(4 字节对齐的)地址.它是 32 位(4 字节)宽，紧随其后的是您将放置第二条永远不会更改的指令.这第二条指令将是一个主管调用 (SVC) 会引发中断，允许您获取下一条指令，将其放入内存并重新开始.

虽然可能不推荐这样做，因为与执行代码相比，您将花费更多的时间进行上下文切换，您实际上不能使用变量(为此您需要使用 RAM)，您不能使用函数调用(除非您手动处理分支指令，哎哟！)并且您的闪存将被写入如此之多，以至于它很快就会变得无用.关于最后一个关于 Flash 变得无用的问题，我假设您想从 RAM 中逐条指令执行.除了所有这些限制之外，您将仍然需要为堆栈、堆和全局变量使用一些 RAM(有关详细信息，请参阅我的附录).该区域可以由从外部闪存运行的所有任务共享，但您需要为此编写自定义链接器脚本，否则会浪费您的 RAM.

让您更清楚的是了解 C 代码是如何编译的.即使您使用 C++，也可以先问自己这个问题，我设备上的变量和指令在哪里编译为?

基本上你在尝试之前必须知道的是:

代码将在哪里执行(闪存/RAM)
这段代码如何链接到它的堆栈、堆和全局变量(你会为这个任务分配一个单独的堆栈，为全局变量分配一个单独的空间，但你可以共享堆).
此外部代码的堆栈、堆和全局变量所在的位置(我试图借此提示您需要对 C 代码进行多少控制)

编辑 2:

如何使用外设 DMA 控制器:

对于我正在使用的微控制器，DMA 控制器实际上并未连接到嵌入式闪存以进行读取或写入.如果您也是这种情况，则不能使用它.但是，您的数据表在这方面不清楚，我怀疑您需要使用串行端口运行测试以查看它是否真的可以工作.

除此之外，我担心使用 DMA 控制器时的写操作可能比您手动执行更复杂，因为缓存页面写入.您需要确保只在页面内进行 DMA 传输，并且 DMA 传输永远不会跨越页面边界.此外，我不确定当您告诉 DMA 控制器从闪存写回同一位置时会发生什么(您可能需要这样做以确保您只覆盖正确的部分).

对可用闪存和 RAM 的担忧:

我很关心你之前关于一次执行一条指令的问题.如果是这种情况，那么您不妨编写一个解释器.如果您没有足够的内存来包含您需要执行的任务的整个代码，那么您需要将任务编译为 PIC，并将全局偏移表 (GOT) 以及所有需要的内存放在 ram 中任务的全局变量.这是解决整个任务没有足够空间的唯一方法.您还必须为其堆栈分配足够的空间.

如果您没有足够的 RAM(我怀疑您不会)，您可以在每次需要在外部闪存芯片上的任务之间进行更改时将 RAM 内存换出并将其转储到闪存中，但我再次强烈建议您建议不要多次写入闪存.这样您就可以让外部闪存上的任务为其全局变量共享一块 RAM.

对于所有其他情况，您将编写解释器.我什至做了不可思议的事，我试图想出一种方法来使用微控制器内存控制器的中止状态(数据表)作为 MPU，但未能找到使用它的远程智能方法.

编辑 3:

我建议阅读中的第 40.8.2 节非易失性存储器 (NVM) 位数据表表明您的闪存最多有 10,000 次写入/擦除周期(我花了一段时间才找到它).这意味着当您写入和擦除闪存区域时，您将在其中执行 10,000 次上下文切换任务，这部分闪存将变得无用.

附录

请先简短阅读这篇博文继续阅读下面我的评论.

C 变量在嵌入式 ARM 芯片上的位置:

我最好的学习不是从抽象的概念而是从具体的例子中学习，所以我会尝试给你提供代码来使用.基本上所有的魔法都发生在你的链接器脚本中.如果您阅读并理解它，您将看到您的代码会发生什么.现在让我们剖析一个:

OUTPUT_FORMAT("elf32-littlearm", "elf32-littlearm", "elf32-littlearm")OUTPUT_ARCH(手臂)搜索目录(.)/* 内存空间定义 */记忆{/* 这里我们定义了我们将要放置的内存区域* 不同的部分进入.不同地区有不同的属性，* 例如，Flash 是只读的(因为您需要特殊说明* 写入它并且写入很慢)，而 RAM 是读写.* 在地区名称后的括号中:* r - 表示允许从此内存区域读取.* w - 表示允许写入此内存区域.* x - 表示您可以在该区域执行代码.*//* 我们将调用 Flash rom 和 RAM ram */rom (rx) : ORIGIN = 0x00400000, LENGTH = 0x00040000/* flash, 256K */ram (rwx) : ORIGIN = 0x20000000, LENGTH = 0x00006000/* sram, 24K */}/* 应用程序使用的堆栈大小.注意:您需要调整 */STACK_SIZE = DEFINED(STACK_SIZE) ?堆栈大小:0x800；/* 部分定义 */部分{.文本 :{.=对齐(4)；_sfixed = .;保持(*(.vectors .vectors.*))*(.text .text.* .gnu.linkonce.t.*)*(.glue_7t) *(.glue_7)*(.rodata .rodata* .gnu.linkonce.r.*)/* 这很重要，.rodata 在 Flash */*(.ARM.extab* .gnu.linkonce.armextab.*)/* 在两种用户代码中都支持 C 构造函数和 C 析构函数和 C 库.这也提供了对 C++ 代码的支持.*/.=对齐(4)；保持(*(.init)).=对齐(4)；__preinit_array_start = .;保持 (*(.preinit_array))__preinit_array_end = .;.=对齐(4)；__init_array_start = .;保持(*(排序(.init_array.*)))保持 (*(.init_array))__init_array_end = .;.=对齐(0x4)；保持 (*crtbegin.o(.ctors))保持 (*(EXCLUDE_FILE (*crtend.o) .ctors))保持 (*(排序(.ctors.*)))保持 (*crtend.o(.ctors)).=对齐(4)；保持(*(.fini)).=对齐(4)；__fini_array_start = .;保持 (*(.fini_array))保持(*(排序(.fini_array.*)))__fini_array_end = .;保持 (*crtbegin.o(.dtors))保持 (*(EXCLUDE_FILE (*crtend.o) .dtors))保持 (*(SORT(.dtors.*)))保持 (*crtend.o(.dtors)).=对齐(4)；_efixed = .;/* 文本部分结束 */} >rom/* 前面大括号中的所有部分都将按照指定的顺序进行 Flash *//* .ARM.exidx 已排序，因此必须进入其自己的输出部分.*/PROVIDE_HIDDEN (__exidx_start = .);.ARM.exidx :{*(.ARM.exidx* .gnu.linkonce.armexidx.*)} >只读存储器PROVIDE_HIDDEN (__exidx_end = .);.=对齐(4)；_etext = .;/* 这里是 .relocate 部分请特别注意 */.relocate : AT (_etext){.=对齐(4)；_srelocate = .;*(.ramfunc .ramfunc.*);*(.data .data.*);.=对齐(4)；_erelocate = .;} >ram/* 前面大括号中的所有部分都将按照指定的顺序进入 RAM *//* .bss 部分，用于未初始化但归零的数据 *//* 请注意 NOLOAD 标志，这意味着当您编译代码时，此部分不会在您的 .hex、.bin 或 .o 文件中，而只是假定已分配 */.bss(空载):{.=对齐(4)；_sbss = .;_szero = .;*(.bss .bss.*)*(常见的).=对齐(4)；_ebss = .;_ezero = .;} >内存/* 堆栈部分 */.stack(空载):{.=对齐(8)；_sstack = .;.= .+ 堆栈大小；.=对齐(8)；_estack = .;} >内存.=对齐(4)；_结束=.;/* 堆从这里扩展到内存末尾 */}

这是为 SAM3N 自动生成的链接器脚本(您的链接器脚本应该只在内存区域定义上有所不同).现在，让我们看看当您的设备在断电后启动时会发生什么.

首先发生的事情是 ARM 内核读取存储在闪存向量表中的地址，该地址指向您的复位向量.重置向量只是一个函数，对我来说它也是由 Atmel Studio 自动生成的.这是:

void Reset_Handler(void){uint32_t *pSrc，*pDest；/* 初始化重定位段 */pSrc = &_etext;pDest = &_srelocate;/* 此代码将初始化全局变量"的所有内存从闪存复制到 RAM */如果(pSrc ！= pDest){for (; pDest < &_erelocate;) {*pDest++ = *pSrc++；}}/* 清除零段 (.bss).由于它在 RAM 中，因此在重置后它可以是任何东西，因此将其归零.*/对于 (pDest = &_szero; pDest < &_ezero;) {*pDest++ = 0;}/* 设置向量表基地址 */pSrc = (uint32_t *) &_固定;SCB->VTOR = ((uint32_t) pSrc & SCB_VTOR_TBLOFF_Msk);if ((((uint32_t) pSrc >= IRAM_ADDR) && ((uint32_t) pSrc

现在，请耐心等待我解释您编写的 C 代码如何适用于所有这些.

考虑以下代码示例:

int UninitializedGlobal;//进入 .bss 段(RAM)int ZeroedGlobal[10] = { 0 };//进入 .bss 段(RAM)int InitializedGlobal[10] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 11 };//转到 .relocate 段(RAM 和 FLASH)const int ConstInitializedGlobal[10] = { 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 };//转到 .rodata 段(FLASH)void 函数(int 参数){静态 int UninitializedStatic;//与上面的 UninitializedGlobal 相同.静态 int ZeroedStatic = 0;//与上面的 ZeroedGlobal 相同.静态 int InitializedStatic = 7;//与上面的 InitializedGlobal 相同.静态常量 int ConstStatic = 18;//与上面的 ConstInitializedGlobal 相同.可能会被优化掉，让我们假设它没有.int UninitializedLocal;//堆叠.(内存)int ZeroedLocal = 0;//堆叠然后初始化(RAM)int InitializedLocal = 7;//堆叠然后初始化(RAM)const int ConstLocal = 91;//实际上不确定这个去哪里.我假设优化掉了.//对所有这些可爱的变量做一些事情...}

I have an ARM processor that is able to interface with an external flash memory chip. Written to the chip are programs compiled for the ARM architecture ready to be executed. What I need to know how to do is get this data from the external flash onto the ARM processor for execution.



Can I run some sort of copy routine ahead-of-time where the data is copied into executable memory space? I suppose I could, but the ARM processor is running an operating system and I don't have a ton of space left over in flash to work with. I'd also like to be able to schedule the execution of two or even three programs at once, and copying multiple programs into internal flash at one time isn't feasible. The operating system can be used to launch the programs once they're within accessible memory space, so anything that needs to be done beforehand can be.
 解决方案 
From reading the existing answers by @FiddlingBits and @ensc I think that I can offer a different approach.

You said that your Flash chip can not be memory mapped. This is a pretty big limitation but we can work with it.

Yes you can run a copy routine ahead of time. So long as you place it into RAM you can execute it.

DMA to make it faster:

If you have a Peripheral DMA Controller (like the one available on the Atmel SAM3N family) then you can use the DMA Controller to copy out chunks of memory while your processor does actually useful things.

MMU to make it simpler:

If you have an MMU available then you can do this easily by just picking out a region of RAM where you want your code to execute, copying the code into it and on every page fault, reloading the correct code into the very same region. However, this was already proposed by @ensc so I'm not adding anything new yet.

Note: In case it's not clear, an MMU is not the same as an MPU

No MMU solution but an MPU is available:

Without an MMU the task is a little trickier but it is still possible to do. You will need to understand how your compiler generates code and read up about Position Independent Code (PIC). Then you will need to allocate a region in RAM that you will execute your external flash chip code from and copy parts of it in there (making sure that you start executing it from the correct location). The MPU will need to be configured to generate a fault any time that task tries to access memory outside of its assigned region and you will then need to fetch the correct memory (this could become a complicated process), reload and continue execution.

No MMU and no MPU available:

If you don't have an MMU this task now becomes very difficult to do. In both cases you have a severe restriction on how big the external code can be. Basically, your code that is stored on the external Flash chip now must be able to fit exactly inside the allocated region in RAM where you will execute it from. If you can split that code up into separate tasks that don't interact with each other than you can do it but otherwise you can not.

If you are generating PIC then you can just compile the tasks and place them in memory sequentially. Otherwise, you will need to use the linker script to control the code generation such that each compiled task that will be stored in external flash will execute from the same predefined location in RAM (which will either require you to learn about ld overlays or compile them separately).

Summary:

To answer your question more completely I would need to know what chip and what operating system you are using. How much RAM is available would also help me better understand your constraints. 

However, you asked if it was possible to load more than one task at a time to run. If you use PIC like I suggested it should be possible to do so. If not, then you would need to decide ahead of time where each of the tasks will run and that would enable to load/run some of the combinations simultaneously.

And finally, depending on your system and chip this could be easy or hard.

EDIT 1:

Additional information given:

The chip is SAM7S (Atmel)
It does have a Peripheral DMA Controller.
It doesn't have a MMU or MPU.
8K of internal RAM, which is a limitation for us. 
It has roughly 28K of flash left over after the operating system, which is custom-written, has been installed. 
Additional questions posed:

Ideally, I'd like to copy the programs over into flash memory space and execute them from there. Theoretically this is possible. Would it be impossible to execute the programs instruction by instruction?
Yes it is possible to execute a program instruction by instruction (but there is a limitation with that approach too that I will get to in a sec). You would start by allocating a (4 byte aligned) address in memory where your single instruction would go. It is 32 bits (4 bytes) wide and immediately following it you would place a second instruction that you would never change. This second instruction would be a supervisor call (SVC) that would raise an interrupt allowing you to fetch the next instruction, place it in memory and start again.

Though possible it isn't recommended because, you will spend more time context switching than executing code, you can't actually use variables (you need to use RAM for that), you can't use function calls (unless you manually process branch instructions, ouch!) and your flash will be written to so much that it will be made useless very fast. With that last one, about Flash being made useless, I will assume that you wanted to execute instruction by instruction from RAM. On top of all of these restrictions you will still have to use some RAM for your stack, heap and globals (see my Appendix for details). This area can be shared by all the tasks running from external flash but you will need to write a custom linker script for this, otherwise you will waste your RAM.

What will make this clearer for you is understanding how C code is compiled. Even if you're using C++ start by asking yourself this, where are the variables and instructions on my device compiled to?

Basically what you MUST know before attempting this is:


where the code will execute (Flash/RAM)
how this code is linked to its stack, heap and globals (you would allocate a separate stack for this task, and separate space for globals but you can share the heap).
where this external code's stack, heap and globals reside (with this I'm trying to hint at how much control you will need to have over your C code)


Edit 2:

How to utilize the Peripheral DMA Controller:

For the microcontroller I'm working with, the DMA controller is actually not connected to the Embedded Flash for either reading or writing. If this is the case for you too you cannot use it. However, your datasheet is unclear in this regard and I suspect that you will need to run a test using the Serial Port to see if it can actually work.

In addition to this, I am concerned that the write operation when using the DMA controller may be more complicated than you doing it manually because of cached page writes. You will need to ensure that you only do the DMA transfers within pages and that a DMA transfer never crosses the page boundary. Also, I'm not sure what happens when you tell the DMA controller to write from flash back into the same location (which you might need to do to ensure you only overwrite the correct parts).

Concerns about the available flash and RAM:

I am concerned with your earlier question about executing it one instruction at a time. If that is the case, then you might as well write an interpreter.
If you don't have enough memory to contain the entire code of the task you need to execute then you will need to compile the task as PIC with the Global Offset Table (GOT) being placed in ram along with all the required memory for that task's globals. That's the only way to get around not having enough space for the whole task. You will also have to allocate enough space for its stack too.

If you don't have enough RAM (which I suspect you won't) you can swap your RAM memory out and dump it into Flash every time you need to change between tasks on the external Flash chip but again I would strongly advise against writing to your flash memory many times. That way you can make the tasks on the external flash share a piece of RAM for their globals.

For all other cases you will be writing an interpreter. I have even done the unthinkable, I have tried to think of a way to use the Abort Status of your microcontroller's memory controller (section 18.3.4 Abort Status in the datasheet) as an MPU but have failed to find even a remotely clever way to use it.

Edit 3:

I would suggest reading the section 40.8.2 Non-volatile Memory (NVM) Bits in the datasheet which suggests that your flash has a maximum of 10,000 write/erase cycles (it took me a while to find it). That means by the time you've written and erased the flash region where you will be context switching the tasks 10,000 times that part of Flash will be rendered useless.

APPENDIX

Please have a short read of this blog entry before continuing to read my comments below.

Where C variables live on an embedded ARM chip:

I learn best not from abstract concepts but concrete examples so I will try and give you code to work with. Basically all the magic happens in your linker script. If you read and understand it you will see what happens to your code. Let's dissect one now:
OUTPUT_FORMAT("elf32-littlearm", "elf32-littlearm", "elf32-littlearm")
OUTPUT_ARCH(arm)
SEARCH_DIR(.)

/* Memory Spaces Definitions */

MEMORY
{
  /* Here we are defining the memory regions that we will be placing
   * different sections into. Different regions have different properties,
   * for example, Flash is read only (because you need special instructions
   * to write to it and writing is slow), while RAM is read write.
   * In the brackets after the region name:
   *   r - denotes that reads are allowed from this memory region.
   *   w - denotes that writes are allowed to this memory region.
   *   x - means that you can execute code in this region.
   */

  /* We will call Flash rom and RAM ram */
  rom (rx)  : ORIGIN = 0x00400000, LENGTH = 0x00040000 /* flash, 256K */
  ram (rwx) : ORIGIN = 0x20000000, LENGTH = 0x00006000 /* sram, 24K */
}

/* The stack size used by the application. NOTE: you need to adjust  */
STACK_SIZE = DEFINED(STACK_SIZE) ? STACK_SIZE : 0x800 ;

/* Section Definitions */
SECTIONS
{
    .text :
    {
        . = ALIGN(4);
        _sfixed = .;
        KEEP(*(.vectors .vectors.*))
        *(.text .text.* .gnu.linkonce.t.*)
        *(.glue_7t) *(.glue_7)
        *(.rodata .rodata* .gnu.linkonce.r.*)  /* This is important, .rodata is in Flash */
        *(.ARM.extab* .gnu.linkonce.armextab.*)

        /* Support C constructors, and C destructors in both user code
           and the C library. This also provides support for C++ code. */
        . = ALIGN(4);
        KEEP(*(.init))
        . = ALIGN(4);
        __preinit_array_start = .;
        KEEP (*(.preinit_array))
        __preinit_array_end = .;

        . = ALIGN(4);
        __init_array_start = .;
        KEEP (*(SORT(.init_array.*)))
        KEEP (*(.init_array))
        __init_array_end = .;

        . = ALIGN(0x4);
        KEEP (*crtbegin.o(.ctors))
        KEEP (*(EXCLUDE_FILE (*crtend.o) .ctors))
        KEEP (*(SORT(.ctors.*)))
        KEEP (*crtend.o(.ctors))

        . = ALIGN(4);
        KEEP(*(.fini))

        . = ALIGN(4);
        __fini_array_start = .;
        KEEP (*(.fini_array))
        KEEP (*(SORT(.fini_array.*)))
        __fini_array_end = .;

        KEEP (*crtbegin.o(.dtors))
        KEEP (*(EXCLUDE_FILE (*crtend.o) .dtors))
        KEEP (*(SORT(.dtors.*)))
        KEEP (*crtend.o(.dtors))

        . = ALIGN(4);
        _efixed = .;            /* End of text section */
    } > rom /* All the sections in the preceding curly braces are going to Flash in the order that they were specified */

    /* .ARM.exidx is sorted, so has to go in its own output section.  */
    PROVIDE_HIDDEN (__exidx_start = .);
    .ARM.exidx :
    {
      *(.ARM.exidx* .gnu.linkonce.armexidx.*)
    } > rom
    PROVIDE_HIDDEN (__exidx_end = .);

    . = ALIGN(4);
    _etext = .;

    /* Here is the .relocate section please pay special attention to it */
    .relocate : AT (_etext)
    {
        . = ALIGN(4);
        _srelocate = .;
        *(.ramfunc .ramfunc.*);
        *(.data .data.*);
        . = ALIGN(4);
        _erelocate = .;
    } > ram  /* All the sections in the preceding curly braces are going to RAM in the order that they were specified */

    /* .bss section which is used for uninitialized but zeroed data */
    /* Please note the NOLOAD flag, this means that when you compile the code this section won't be in your .hex, .bin or .o files but will be just assumed to have been allocated */
    .bss (NOLOAD) :
    {
        . = ALIGN(4);
        _sbss = . ;
        _szero = .;
        *(.bss .bss.*)
        *(COMMON)
        . = ALIGN(4);
        _ebss = . ;
        _ezero = .;
    } > ram

    /* stack section */
    .stack (NOLOAD):
    {
        . = ALIGN(8);
        _sstack = .;
        . = . + STACK_SIZE;
        . = ALIGN(8);
        _estack = .;
    } > ram

    . = ALIGN(4);
    _end = . ;

    /* heap extends from here to end of memory */
}
This is an automatically generated linker script for the SAM3N (your linker script should only differ in the memory region definitions). Now, let's go through what happens when your device boots after being powered off.

The first thing that happens is that the ARM core reads the address stored in the FLASH memory's vector table that points to your reset vector. The reset vector is just a function and for me it is also autogenerated by Atmel Studio. Here it is:
void Reset_Handler(void)
{
    uint32_t *pSrc, *pDest;

    /* Initialize the relocate segment */
    pSrc = &_etext;
    pDest = &_srelocate;

    /* This code copyes all of the memory for "initialised globals" from Flash to RAM */
    if (pSrc != pDest) {
        for (; pDest < &_erelocate;) {
            *pDest++ = *pSrc++;
        }
    }

    /* Clear the zero segment (.bss). Since it in RAM it could be anything after a reset so zero it. */
    for (pDest = &_szero; pDest < &_ezero;) {
        *pDest++ = 0;
    }

    /* Set the vector table base address */
    pSrc = (uint32_t *) & _sfixed;
    SCB->VTOR = ((uint32_t) pSrc & SCB_VTOR_TBLOFF_Msk);

    if (((uint32_t) pSrc >= IRAM_ADDR) && ((uint32_t) pSrc < IRAM_ADDR + IRAM_SIZE)) {
        SCB->VTOR |= 1 << SCB_VTOR_TBLBASE_Pos;
    }

    /* Initialize the C library */
    __libc_init_array();

    /* Branch to main function */
    main();

    /* Infinite loop */
    while (1);
}
Now, bear with me for a little longer while I explain how C code that you write fits into all of this.

Consider the following code example:
int UninitializedGlobal; // Goes to the .bss segment (RAM)
int ZeroedGlobal[10] = { 0 }; // Goes to the .bss segment (RAM)
int InitializedGlobal[10] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 11 }; // Goes to the .relocate segment (RAM and FLASH)
const int ConstInitializedGlobal[10] = { 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 }; // Goes to the .rodata segment (FLASH)

void function(int parameter)
{
    static int UninitializedStatic; // Same as UninitializedGlobal above.
    static int ZeroedStatic = 0; // Same as ZeroedGlobal above.
    static int InitializedStatic = 7; // Same as InitializedGlobal above.
    static const int ConstStatic = 18; // Same as ConstInitializedGlobal above. Might get optimized away though, lets assume it doesn't.

    int UninitializedLocal; // Stacked. (RAM)
    int ZeroedLocal = 0; // Stacked and then initialized (RAM)
    int InitializedLocal = 7; // Stacked and then initialized (RAM)
    const int ConstLocal = 91; // Not actually sure where this one goes. I assume optimized away.

    // Do something with all those lovely variables...
}


                        
这篇关于在 ARM 处理器上执行存储在外部 SPI 闪存中的程序的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

在 ARM 处理器上执行存储在外部 SPI 闪存中的程序 [英] Executing programs stored in external SPI flash memory on an ARM processor

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在 ARM 处理器上执行存储在外部 SPI 闪存中的程序 [英] Executing programs stored in external SPI flash memory on an ARM processor

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭