正在访问“价值"链接脚本变量未定义行为在C中的行为? [英] Is accessing the "value" of a linker script variable undefined behavior in C?

查看:90
本文介绍了正在访问“价值"链接脚本变量未定义行为在C中的行为?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

GNU ld(链接程序脚本)手册

快速复习:

想象一下在链接描述文件中(例如: STM32F103RBTx_FLASH.ld ),您有:

/* Specify the memory areas */
MEMORY
{
    FLASH (rx)      : ORIGIN = 0x8000000,  LENGTH = 128K
    RAM (xrw)       : ORIGIN = 0x20000000, LENGTH = 20K
}

/* Some custom variables (addresses) I intend to access from my C source code */
__flash_start__ = ORIGIN(FLASH);
__flash_end__ = ORIGIN(FLASH) + LENGTH(FLASH);
__ram_start__ = ORIGIN(RAM);
__ram_end__ = ORIGIIN(RAM) + LENGTH(RAM);

在C源代码中,您可以这样做:

// 1. correct way A:
extern uint32_t __flash_start__;
printf("__flash_start__ addr = 0x%lX\n", (uint32_t)&__flash_start__);

// OR 2. correct way B (my preferred approach):
extern uint32_t __flash_start__[]; // not a true array; [] is required to access linker script variables (addresses) as though they were normal variables
printf("__flash_start__ addr = 0x%lX\n", (uint32_t)__flash_start__);

// OR 3. COMPLETELY WRONG WAY TO DO IT!
// - IS THIS UNDEFINED BEHAVIOR?
extern uint32_t __flash_start__;
printf("__flash_start__ addr = 0x%lX\n", __flash_start__);

样本打印输出

(这是真实的输出:它实际上是由STM32 MCU编译,运行和打印的):

  1. __flash_start__ addr = 0x8000000
  2. __flash_start__ addr = 0x8000000
  3. __flash_start__ addr = 0x20080000< ==注意,就像我之前所说的:这是完全错误的(即使它可以编译并运行)! < ==更新2020年3月:实际上,请看我的回答,这也很好,没错,它所做的只是与众不同.


更新:

回复@Eric Postpischil的第一条评论:

C标准根本没有定义有关链接描述文件符号的任何内容.任何行为规范都取决于GNU工具.就是说,如果一个链接描述文件符号标识了内存中一些有效对象的存储位置,那么我希望访问该对象的值是可行的,前提是已使用其正确的类型对其进行了访问.假设 flash_start 通常是可访问的内存,并且除了系统对 flash_start 的要求外,从理论上讲,您可以输入uint32_t(使用适当的输入链接器),然后通过 flash_start 访问它.

是的,但这不是我的问题.我不确定您是否正在回答我的问题的精妙之处.看一下我提供的示例.确实可以很好地访问此位置,但是请确保您了解操作方式,然后我的问题就会很明显.尤其要看上面的示例3,这是错误,即使对于C程序员来说看起来是正确的.要读取uint32_t,例如,在__flash_start__,您可以这样做:

extern uint32_t __flash_start__;
uint32_t u32 = *((uint32_t *)&__flash_start__); // correct, even though it *looks like* you're taking the address (&) of an address (__flash_start__)

或者这个:

extern uint32_t __flash_start__[];
uint32_t u32 = *((uint32_t *)__flash_start__); // also correct, and my preferred way of doing it because it looks more correct to the trained "C-programmer" eye

但最肯定不是这样的:

extern uint32_t __flash_start__;
uint32_t u32 = __flash_start__; // incorrect; <==UPDATE: THIS IS ALSO CORRECT! (and more straight-forward too, actually; see comment discussion under this question)

不是这个:

extern uint32_t __flash_start__;
uint32_t u32 = *((uint32_t *)__flash_start__); // incorrect, but *looks* right

相关:

解决方案

更简短的答案:

访问链接脚本变量的值"不是未定义的行为,并且可以很好地执行,只要您希望将实际数据存储在内存中的那个位置而不是地址该内存或链接脚本变量的值",恰好被C代码视为内存中的地址 ,而不是 .

是的,这有点令人困惑,因此请仔细阅读3遍. 本质上,如果要访问链接脚本变量的值,只需确保已将链接脚本设置为可防止将不需要的任何内容终止在该内存地址中,那么实际上您想要的任何内容都存在那里.这样,读取该内存地址上的值将为您提供一些有用的信息.

但是,如果您使用链接程序脚本变量本身来存储某种值",则在C中获取这些链接程序脚本变量的值"的方法是读取其地址,因为您分配给链接描述文件中变量的值"由C编译器视为该链接描述文件变量的地址",因为链接描述文件旨在操纵内存和内存地址,而不是传统的C.变量.

在我的问题下,这里有一些非常有价值且正确的评论,我认为值得在此答案中发布,因此它们永远不会丢失. 请在我上面的问题下投票赞成他的评论.

C标准根本没有定义有关链接描述文件符号的任何内容.任何行为规范都取决于GNU工具.就是说,如果一个链接描述文件符号标识了内存中一些有效对象的存储位置,那么我希望访问该对象的值是可行的,前提是已使用其正确的类型对其进行了访问.假设__flash_start__通常是可访问的内存,并且除了系统上关于__flash_start__的任何要求之外,从理论上讲,您可以放置​​uint32_t(使用链接器的适当输入),然后通过__flash_start__.
–埃里克(Eric Postpischil)

该文档编写得不太好,您从字面上看也太含糊.这里真正发生的是,链接器的符号值"概念和编程语言的标识符的值"概念是不同的.对于链接器,符号的值只是与之关联的数字.在编程语言中,值是存储在与标识符关联的(有时是名义上的)存储器中的数字(或某种类型的值的集合中的其他元素).该文档建议您链接器的符号值出现在C语言之内,作为与标识符关联的地址,而不是其存储内容...

这部分非常重要,我们应该更新GNU链接器脚本手册:

当它告诉您从不尝试使用其值"时,它就太过分了.

正确的是,仅定义链接器符号不会为编程语言对象保留必要的存储空间,因此仅具有链接器符号并不会为您提供可以访问的存储空间.但是,如果确保通过其他方式分配存储空间,那么可以肯定,它可以用作编程语言对象. 如果您已正确分配存储空间并满足其他要求,则不普遍禁止将链接器符号用作C中的标识符,包括访问其C值.如果链接器值是有效的内存地址,并且已确保在该地址上存储uint32_t,并且它是uint32_t的正确对齐地址,那么可以在C中访问__flash_start__,就好像它是uint32_t.那不是由C标准定义的,而是由GNU工具定义的.
–埃里克(Eric Postpischil)

长答案:

我在问题中说:

// 1. correct way A:
extern uint32_t __flash_start__;
printf("__flash_start__ addr = 0x%lX\n", (uint32_t)&__flash_start__);

// OR 2. correct way B (my preferred approach):
extern uint32_t __flash_start__[]; // not a true array; [] is required to access linker script variables (addresses) as though they were normal variables
printf("__flash_start__ addr = 0x%lX\n", (uint32_t)__flash_start__);

// OR 3. COMPLETELY WRONG WAY TO DO IT!
// - IS THIS UNDEFINED BEHAVIOR?
extern uint32_t __flash_start__;
printf("__flash_start__ addr = 0x%lX\n", __flash_start__);

(请参阅问题下有关我如何解决的讨论).

专门看上面的#3 :

实际上,如果您的目标是读取__flash_start__地址,在这种情况下为0x8000000,那么是的,这是完全错误的.但是,这不是未定义的行为!相反,它实际上是在读取该地址(0x8000000)的 contents (值)作为uint32_t类型.换句话说,它只是读取FLASH节的前4个字节,并将其解释为uint32_t. contents (此地址的uint32_t值)在这种情况下恰好是0x20080000.

为进一步证明这一点,以下内容完全相同:

// Read the actual *contents* of the __flash_start__ address as a 4-byte value!
// The 2 techniques should be the same.
extern uint32_t __flash_start__;
uint32_t u32_1 = __flash_start__;
uint32_t u32_2 = *((uint32_t *)&__flash_start__);
printf("u32_1 = 0x%lX\n", u32_1);
printf("u32_2 = 0x%lX\n", u32_2);

输出为:

u32_1 = 0x20080000
u32_2 = 0x20080000

请注意,它们会产生相同的结果.它们每个都产生一个有效的uint32_t类型值,该值存储在地址0x8000000中.

但是,事实证明,上面显示的u32_1技术是一种更直接,直接的读取值的方法,而且,不是是未定义的行为.而是,它正在正确读取该地址的值(内容).

我似乎在圈子里说话.不管怎么说,我的想法很震撼,但我现在明白了.在我应该只使用上面显示的u32_2技术之前,我就被说服了,但事实证明它们都很好,而且u32_1技术显然更加简单明了(我又在圈子里说话了) ). :)

干杯.


深入研究:存储在我的FLASH存储器开头的0x20080000值是从哪里来的?

一个小花絮.我实际上是在具有512KiB RAM的STM32F777 MCU上运行了此测试代码.由于RAM从地址0x20000000开始,这意味着0x20000000 + 512K = 0x20080000.恰好也是地址0处RAM的内容,因为

我知道向量表位于程序存储器的开始位置,该程序位于Flash中,因此这意味着0x20080000是我的初始堆栈指针值.这是有道理的,因为Reset_Handler是程序的开始(顺便说一句,它的向量恰好是向量表开始处的第二个4字节值),并且它首先要做的是,如我的" startup_stm32f777xx.s "启动程序集文件中所示,将堆栈指针(sp)设置为_estack:

Reset_Handler:  
  ldr   sp, =_estack      /* set stack pointer */

此外,_estack在我的链接描述文件中的定义如下:

/* Highest address of the user mode stack */
_estack = ORIGIN(RAM) + LENGTH(RAM);    /* end of RAM */

因此,您已经拥有了!我的向量表中位于Flash开头的第一个4字节值设置为初始堆栈指针值,该值在我的链接脚本文件中定义为_estack,而_estack是位于以下位置的地址我的RAM的末尾,即0x20000000 + 512K = 0x20080000.所以,这一切都说得通!我刚刚证明我读了正确的值!

The GNU ld (linker script) manual Section 3.5.5 Source Code Reference has some really important information on how to access linker script "variables" (which are actually just integer addresses) in C source code. I used this info. to extensively use linker script variables, and I wrote this answer here: How to get value of variable defined in ld linker script from C.

However, it is easy to do it wrong and make the mistake of trying to access a linker script variable's value (mistakenly) instead of its address, since this is a bit esoteric. The manual (link above) says:

This means that you cannot access the value of a linker script defined symbol - it has no value - all you can do is access the address of a linker script defined symbol.

Hence when you are using a linker script defined symbol in source code you should always take the address of the symbol, and never attempt to use its value.

The question: So, if you do attempt to access a linker script variable's value, is this "undefined behavior"?

Quick refresher:

Imagine in linker script (ex: STM32F103RBTx_FLASH.ld) you have:

/* Specify the memory areas */
MEMORY
{
    FLASH (rx)      : ORIGIN = 0x8000000,  LENGTH = 128K
    RAM (xrw)       : ORIGIN = 0x20000000, LENGTH = 20K
}

/* Some custom variables (addresses) I intend to access from my C source code */
__flash_start__ = ORIGIN(FLASH);
__flash_end__ = ORIGIN(FLASH) + LENGTH(FLASH);
__ram_start__ = ORIGIN(RAM);
__ram_end__ = ORIGIIN(RAM) + LENGTH(RAM);

And in your C source code you do:

// 1. correct way A:
extern uint32_t __flash_start__;
printf("__flash_start__ addr = 0x%lX\n", (uint32_t)&__flash_start__);

// OR 2. correct way B (my preferred approach):
extern uint32_t __flash_start__[]; // not a true array; [] is required to access linker script variables (addresses) as though they were normal variables
printf("__flash_start__ addr = 0x%lX\n", (uint32_t)__flash_start__);

// OR 3. COMPLETELY WRONG WAY TO DO IT!
// - IS THIS UNDEFINED BEHAVIOR?
extern uint32_t __flash_start__;
printf("__flash_start__ addr = 0x%lX\n", __flash_start__);

Sample printed output

(this is real output: it was actually compiled, run, and printed by an STM32 mcu):

  1. __flash_start__ addr = 0x8000000
  2. __flash_start__ addr = 0x8000000
  3. __flash_start__ addr = 0x20080000 <== NOTICE LIKE I SAID ABOVE: this one is completely wrong (even though it compiles and runs)! <== Update Mar. 2020: actually, see my answer, this is just fine and right too, it just does something different is all.


Update:

Response to @Eric Postpischil's 1st comment:

The C standard does not define anything at all about linker script symbols. Any specification of behavior is up to the GNU tools. That said, if a linker script symbol identifies a place in memory where some valid object is stored, I would expect accessing the value of that object to work, if it were accessed with its proper type. Supposing flash_start is normally accessible memory, and except for any requirements of your system about what is at flash_start, you could, in theory, put a uint32_t (using appropriate input to the linker) and then access it via flash_start.

Yes, but that's not my question. I'm not sure if you're picking up the subtlety of my question. Take a look at the examples I provide. It is true you can access this location just fine, but make sure you understand how you do so, and then my question will become apparent. Look especially at example 3 above, which is wrong even though to a C programmer it looks right. To read a uint32_t, for ex, at __flash_start__, you'd do this:

extern uint32_t __flash_start__;
uint32_t u32 = *((uint32_t *)&__flash_start__); // correct, even though it *looks like* you're taking the address (&) of an address (__flash_start__)

OR this:

extern uint32_t __flash_start__[];
uint32_t u32 = *((uint32_t *)__flash_start__); // also correct, and my preferred way of doing it because it looks more correct to the trained "C-programmer" eye

But most definitely NOT this:

extern uint32_t __flash_start__;
uint32_t u32 = __flash_start__; // incorrect; <==UPDATE: THIS IS ALSO CORRECT! (and more straight-forward too, actually; see comment discussion under this question)

and NOT this:

extern uint32_t __flash_start__;
uint32_t u32 = *((uint32_t *)__flash_start__); // incorrect, but *looks* right

Related:

解决方案

Shorter answer:

Accessing the "value" of a linker script variable is NOT undefined behavior, and is fine to do, so long as you want the actual data stored at that location in memory and not the address of that memory or the "value" of a linkerscript variable which happens to be seen by C code as an address in memory only and not a value.

Yeah, that's kind of confusing, so re-read that 3 times carefully. Essentially, if you want to access the value of a linker script variable just ensure your linker script is set up to prevent anything you don't want from ending up in that memory address so that whatever you DO want there is in fact there. This way, reading the value at that memory address will provide you something useful you expect to be there.

BUT, if you're using linker script variables to store some sort of "values" in and of themselves, the way to grab the "values" of these linker script variables in C is to read their addresses, because the "value" you assign to a variable in a linker script IS SEEN BY THE C COMPILER AS THE "ADDRESS" of that linker script variable, since linker scripts are designed to manipulate memory and memory addresses, NOT traditional C variables.

Here's some really valuable and correct comments under my question which I think are worth posting in this answer so they never get lost. Please go upvote his comments under my question above.

The C standard does not define anything at all about linker script symbols. Any specification of behavior is up to the GNU tools. That said, if a linker script symbol identifies a place in memory where some valid object is stored, I would expect accessing the value of that object to work, if it were accessed with its proper type. Supposing __flash_start__ is normally accessible memory, and except for any requirements of your system about what is at __flash_start__, you could, in theory, put a uint32_t (using appropriate input to the linker) and then access it via __flash_start__.
– Eric Postpischil

That documentation is not written very well, and you are taking the first sentence too literally. What is really happening here is that the linker’s notion of the "value" of a symbol and a programming language’s notion of the "value" of an identifier are different things. To the linker, the value of a symbol is simply a number associated with it. In a programming language, the value is a number (or other element in the set of values of some type) stored in the (sometimes notional) storage associated with the identifier. The documentation is advising you that the linker’s value of a symbol appears inside a language like C as the address associated with the identifier, rather than the contents of its storage...

THIS PART IS REALLY IMPORTANT and we should get the GNU linker script manual updated:

It goes too far when it tells you to "never attempt to use its value."

It is correct that merely defining a linker symbol does not reserve the necessary storage for a programming language object, and therefore merely having a linker symbol does not provide you storage you can access. However if you ensure storage is allocated by some other means, then, sure, it can work as a programming language object. There is no general prohibition on using a linker symbol as an identifier in C, including accessing its C value, if you have properly allocated storage and otherwise satisfied the requirements for this. If the linker value of __flash_start__ is a valid memory address, and you have ensure there is storage for a uint32_t at that address, and it is a properly aligned address for a uint32_t, then it is okay to access __flash_start__ in C as if it were a uint32_t. That would not be defined by the C standard, but by the GNU tools.
– Eric Postpischil

Long answer:

I said in the question:

// 1. correct way A:
extern uint32_t __flash_start__;
printf("__flash_start__ addr = 0x%lX\n", (uint32_t)&__flash_start__);

// OR 2. correct way B (my preferred approach):
extern uint32_t __flash_start__[]; // not a true array; [] is required to access linker script variables (addresses) as though they were normal variables
printf("__flash_start__ addr = 0x%lX\n", (uint32_t)__flash_start__);

// OR 3. COMPLETELY WRONG WAY TO DO IT!
// - IS THIS UNDEFINED BEHAVIOR?
extern uint32_t __flash_start__;
printf("__flash_start__ addr = 0x%lX\n", __flash_start__);

(See discussion under the question for how I came to this).

Looking specifically at #3 above:

Well, actually, if your goal is to read the address of __flash_start__, which is 0x8000000 in this case, then yes, this is completely wrong. But, it is NOT undefined behavior! What it is actually doing, instead, is reading the contents (value) of that address (0x8000000) as a uint32_t type. In other words, it's simply reading the first 4 bytes of the FLASH section, and interpreting them as a uint32_t. The contents (uint32_t value at this address) just so happen to be 0x20080000 in this case.

To further prove this point, the following are exactly identical:

// Read the actual *contents* of the __flash_start__ address as a 4-byte value!
// The 2 techniques should be the same.
extern uint32_t __flash_start__;
uint32_t u32_1 = __flash_start__;
uint32_t u32_2 = *((uint32_t *)&__flash_start__);
printf("u32_1 = 0x%lX\n", u32_1);
printf("u32_2 = 0x%lX\n", u32_2);

The output is:

u32_1 = 0x20080000
u32_2 = 0x20080000

Notice they produce the same result. They each are producing a valid uint32_t-type value which is stored at address 0x8000000.

It just so turns out, however, that the u32_1 technique shown above is a more straight-forward and direct way of reading the value is all, and again, is not undefined behavior. Rather, it is correctly reading the value (contents of) that address.

I seem to be talking in circles. Anyway, mind blown, but I get it now. I was convinced before I was supposed to use the u32_2 technique shown above only, but it turns out they are both just fine, and again, the u32_1 technique is clearly more straight-forward (there I go talking in circles again). :)

Cheers.


Digging deeper: Where did the 0x20080000 value stored right at the start of my FLASH memory come from?

One more little tidbit. I actually ran this test code on an STM32F777 mcu, which has 512KiB of RAM. Since RAM starts at address 0x20000000, this means that 0x20000000 + 512K = 0x20080000. This just so happens to also be the contents of the RAM at address zero because Programming Manual PM0253 Rev 4, pg. 42, "Figure 10. Vector table" shows that the first 4 bytes of the Vector Table contain the "Initial SP [Stack Pointer] value". See here:

I know that the Vector Table sits right at the start of the program memory, which is located in Flash, so that means that 0x20080000 is my initial stack pointer value. This makes sense, because the Reset_Handler is the start of the program (and its vector just so happens to be the 2nd 4-byte value at the start of the Vector Table, by the way), and the first thing it does, as shown in my "startup_stm32f777xx.s" startup assembly file, is set the stack pointer (sp) to _estack:

Reset_Handler:  
  ldr   sp, =_estack      /* set stack pointer */

Furthermore, _estack is defined in my linker script as follows:

/* Highest address of the user mode stack */
_estack = ORIGIN(RAM) + LENGTH(RAM);    /* end of RAM */

So there you have it! The first 4-byte value in my Vector Table, right at the start of Flash, is set to be the initial stack pointer value, which is defined as _estack right in my linker script file, and _estack is the address at the end of my RAM, which is 0x20000000 + 512K = 0x20080000. So, it all makes sense! I've just proven I read the right value!

这篇关于正在访问“价值"链接脚本变量未定义行为在C中的行为?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆