正在访问“值";C 中的链接描述文件变量未定义行为? [英] Is accessing the "value" of a linker script variable undefined behavior in C?

查看:16
本文介绍了正在访问“值";C 中的链接描述文件变量未定义行为?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

GNU ld(链接器脚本)手册部分 3.5.5 源代码参考 提供了一些关于如何访问链接描述文件变量"的非常重要的信息.(实际上只是整数地址)在 C 源代码中.我用了这个信息.广泛使用链接器脚本变量,我在这里写了这个答案:如何从 C 中获取 ld 链接描述文件中定义的变量的值.

但是,很容易做错并尝试访问链接描述文件变量的(错误地)而不是其地址,因为这有点深奥.手册(上面的链接)说:

<块引用>

这意味着您无法访问链接描述文件定义符号的 - 它没有值 - 你所能做的就是访问链接描述文件定义符号的地址.

<块引用>

因此,当您在源代码中使用链接描述文件定义的符号时,您应该始终获取符号的地址,并且永远不要尝试使用它的值.

问题:那么,如果您确实尝试访问链接描述文件变量的,这是未定义的行为"吗?

快速复习:

想象一下在链接器脚本中(例如:STM32F103RBTx_FLASH.ld)你有:

/* 指定内存区域 */记忆{闪存(rx):原点 = 0x8000000,长度 = 128KRAM (xrw):原点 = 0x20000000,长度 = 20K}/* 我打算从我的 C 源代码访问的一些自定义变量(地址) */__flash_start__ = ORIGIN(FLASH);__flash_end__ = 原点(闪光)+ 长度(闪光);__ram_start__ = 原点(RAM);__ram_end__ = 原点(RAM)+ 长度(RAM);

在你的 C 源代码中你可以这样做:

//1.正确方法A:extern uint32_t __flash_start__;printf("__flash_start__ addr = 0x%lX
", (uint32_t)&__flash_start__);//或者 2. 正确的方式 B(我的首选方式):extern uint32_t __flash_start__[];//不是一个真正的数组;[] 是访问链接描述文件变量(地址)所必需的,就好像它们是普通变量一样printf("__flash_start__ addr = 0x%lX
", (uint32_t)__flash_start__);//或 3. 完全错误的方法!//- 这是未定义的行为吗?extern uint32_t __flash_start__;printf("__flash_start__ addr = 0x%lX
", __flash_start__);

打印输出示例

(这是真实的输出:它实际上是由 STM32 单片机编译、运行和打印的):

  1. __flash_start__ addr = 0x8000000
  2. __flash_start__ addr = 0x8000000
  3. __flash_start__ addr = 0x20080000 <== 就像我上面说的:这个是完全错误的(即使它编译和运行)!<== 2020 年 3 月更新:实际上,看看我的回答,这很好,也很好,只是做了一些不同的事情.


更新:

回复@Eric Postpischil 的第一条评论:

<块引用>

C 标准根本没有定义任何关于链接描述文件符号的内容.任何行为规范都取决于 GNU 工具.也就是说,如果链接描述文件符号标识了内存中存储某些有效对象的位置,那么我希望访问该对象的值能够正常工作,如果它是以其正确的类型访问的.假设 flash_start 通常是可访问的内存,并且除了您的系统对 flash_start 的内容有任何要求外,理论上您可以放置​​一个 uint32_t(使用适当的输入到链接器),然后通过 flash_start 访问它.

是的,但这不是我的问题.我不确定你是否注意到我的问题的微妙之处.看看我提供的例子.确实,您可以很好地访问此位置,但请确保您了解如何您这样做,然后我的问题就会变得明显.尤其是上面的示例 3,它是错误的,即使对于 C 程序员来说它看起来是正确的.要读取 uint32_t,例如,在 __flash_start__,您可以这样做:

extern uint32_t __flash_start__;uint32_t u32 = *((uint32_t *)&__flash_start__);//正确,即使*看起来*您正在获取地址 (__flash_start__) 的地址 (&)

或者这个:

extern uint32_t __flash_start__[];uint32_t u32 = *((uint32_t *)__flash_start__);//也是正确的,也是我喜欢的做法,因为对于受过训练的C 程序员"来说,它看起来更正确眼睛

但绝对不是这个:

extern uint32_t __flash_start__;uint32_t u32 = __flash_start__;//不正确;<==更新:这也是正确的!(实际上也更直截了当;请参阅此问题下的评论讨论)

不是这个:

extern uint32_t __flash_start__;uint32_t u32 = *((uint32_t *)__flash_start__);//不正确,但*看起来*正确

相关:

  1. 为什么要STM32gcc 链接器脚本会自动丢弃这些标准库中的所有输入部分:libc.a、libm.a、libgcc.a?
  2. [我的回答] 如何从 C 中获取 ld 链接描述文件中定义的变量的值

解决方案

简答:

访问值";链接描述文件变量的行为不是未定义的行为,并且可以这样做,只要您希望 将实际数据存储在内存中的该位置,而不是该内存的地址或值".一个链接脚本变量,恰好被 C 代码视为 内存中的地址 不是值.

是的,这有点令人困惑,因此请仔细阅读 3 遍.本质上,如果你想访问链接描述文件变量的值,只需确保你的链接描述文件被设置为防止你不想要的任何东西最终出现在那个内存地址中,这样你想要的东西实际上就是那里.这样,读取该内存地址处的值将为您提供您期望在那里的有用信息.

但是,如果您使用链接描述文件变量来存储某种值";就其本身而言,获取价值"的方式;C 中的这些链接描述文件变量中的一个是读取它们的地址,因为值"是您分配给链接描述文件中的变量被 C 编译器视为地址";该链接描述文件变量,因为链接描述文件旨在操作内存和内存地址,而不是传统的 C 变量.

在我的问题下,这里有一些非常有价值且正确的评论,我认为这些评论值得在此答案中发布,因此它们永远不会迷路.请在我上面的问题下投票赞成他的评论.

<块引用>

C 标准根本没有定义任何关于链接描述文件符号的内容.任何行为规范都取决于 GNU 工具.也就是说,如果链接描述文件符号标识了内存中存储某些有效对象的位置,那么我希望访问该对象的值能够正常工作,如果它是以其正确的类型访问的.假设 __flash_start__ 通常是可访问的内存,并且除了您的系统对 __flash_start__ 的内容有任何要求外,理论上您可以放置​​一个 uint32_t(使用链接器的适当输入),然后通过 __flash_start__ 访问它.
——埃里克·波斯特皮希尔

<块引用>

该文档写得不是很好,而且您对第一句话的理解过于字面意思.这里真正发生的是,链接器对符号值"的概念和编程语言对标识符值"的概念是不同的.对于链接器来说,符号的值只是一个与之关联的数字.在编程语言中,值是存储在与标识符关联的(有时是名义上的)存储中的数字(或某种类型的值集中的其他元素).文档建议您链接器的符号值出现在像 C 这样的语言中,作为与标识符关联的地址,而不是其存储的内容......

这部分非常重要,我们应该更新 GNU 链接器脚本手册:

<块引用>

当它告诉你永远不要尝试使用它的价值"时,这就太过分了.

<块引用>

正确的是,仅定义链接器符号不会为编程语言对象保留必要的存储空间,因此仅具有链接器符号并不能为您提供可以访问的存储空间.但是,如果确保通过其他方式分配存储空间,那么它当然可以作为编程语言对象工作.没有一般禁止在 C 中使用链接器符号作为标识符,包括访问其 C 值,前提是您已正确分配存储空间或满足此要求. 如果 __flash_start__ 是一个有效的内存地址,并且您已确保在该地址存在 uint32_t 的存储空间,并且它是 uint32_t 的正确对齐地址,那么就可以在 C 中访问 __flash_start__,就好像它是一个 uint32_t.这不是由 C 标准定义的,而是由 GNU 工具定义的.
——埃里克·波斯特皮希尔

长答案:

我在问题中说:

//1.正确方法A:extern uint32_t __flash_start__;printf("__flash_start__ addr = 0x%lX
", (uint32_t)&__flash_start__);//或者 2. 正确的方式 B(我的首选方式):extern uint32_t __flash_start__[];//不是一个真正的数组;[] 是访问链接描述文件变量(地址)所必需的,就好像它们是普通变量一样printf("__flash_start__ addr = 0x%lX
", (uint32_t)__flash_start__);//或 3. 完全错误的方法!//- 这是未定义的行为吗?extern uint32_t __flash_start__;printf("__flash_start__ addr = 0x%lX
", __flash_start__);

(请参阅问题下的讨论,了解我是如何得出这个结论的).

具体看上面的#3:

其实,如果你的目标是读取__flash_start__地址,在本例中是0x8000000,那么是的,这个是完全错误的.但是,这不是未定义的行为!相反,它实际上是在读取该地址 (0x8000000) 的 contents(值)作为 uint32_t 类型.换句话说,它只是读取 FLASH 部分的前 4 个字节,并将它们解释为 uint32_t.contents(该地址的uint32_t值)在这种情况下恰好是0x20080000.

为了进一步证明这一点,以下是完全相同的:

//将 `__flash_start__` 地址的实际 *contents* 读取为 4 字节值!//前向声明以在链接描述文件中定义一个变量//可在 C 代码中访问extern uint32_t __flash_start__;//这两种读取技术做的事情完全相同.uint32_t u32_1 = __flash_start__;//技术 1uint32_t u32_2 = *((uint32_t *)&__flash_start__);//技术2printf("u32_1 = 0x%lX
", u32_1);printf("u32_2 = 0x%lX
", u32_2);

输出是:

u32_1 = 0x20080000u32_2 = 0x20080000

请注意,它们产生相同的结果.它们每个都产生一个有效的 uint32_t-type 值,该值存储在地址 0x8000000.

然而,事实证明,上面显示的 u32_1 技术是一种更直接、更直接的读取值的方法,并且再次不是 未定义的行为.相反,它正确读取了该地址的值(内容).

我似乎在兜圈子.无论如何,头脑被吹了,但我现在明白了.在我应该只使用上面显示的 u32_2 技术之前,我被说服了,但事实证明它们都很好,而且 u32_1 技术显然更直接-forward(我又开始兜圈子了).:)

干杯.


深入挖掘:存储在我的 FLASH 存储器开头的 0x20080000 值从何而来?

还有一个小花絮.我实际上在 STM32F777 单片机上运行了这个测试代码,它有 512KiB 的 RAM.由于 RAM 从地址 0x20000000 开始,这意味着 0x20000000 + 512K = 0x20080000.这恰好也是地址零处的 RAM 内容,因为

我知道向量表位于程序存储器的开头,它位于闪存中,这意味着 0x20080000 是我的初始堆栈指针值.这是有道理的,因为 Reset_Handler 是程序的开始(顺便说一下,它的向量恰好是向量表开头的第二个 4 字节值),并且它做的第一件事,如我的startup_stm32f777xx.s"中所示.启动汇编文件,将堆栈指针(sp)设置为_estack:

Reset_Handler:ldr sp, =_estack/* 设置堆栈指针 */

此外,_estack 在我的链接器脚本中定义如下:

/* 用户态栈的最高地址 */_estack = 原点(RAM)+ 长度(RAM);/* 内存结束 */

所以你有它!我的向量表中的第一个 4 字节值,就在 Flash 的开头,被设置为初始堆栈指针值,它在我的链接描述文件文件中定义为 _estack,并且 _estack 是我的 RAM 末尾的地址,即 0x20000000 + 512K = 0x20080000.所以,这一切都说得通!我刚刚证明我读到了正确的价值观!

另见:

  1. [我的回答] 如何从 C 中获取 ld 链接描述文件中定义的变量的值

The GNU ld (linker script) manual Section 3.5.5 Source Code Reference has some really important information on how to access linker script "variables" (which are actually just integer addresses) in C source code. I used this info. to extensively use linker script variables, and I wrote this answer here: How to get value of variable defined in ld linker script from C.

However, it is easy to do it wrong and make the mistake of trying to access a linker script variable's value (mistakenly) instead of its address, since this is a bit esoteric. The manual (link above) says:

This means that you cannot access the value of a linker script defined symbol - it has no value - all you can do is access the address of a linker script defined symbol.

Hence when you are using a linker script defined symbol in source code you should always take the address of the symbol, and never attempt to use its value.

The question: So, if you do attempt to access a linker script variable's value, is this "undefined behavior"?

Quick refresher:

Imagine in linker script (ex: STM32F103RBTx_FLASH.ld) you have:

/* Specify the memory areas */
MEMORY
{
    FLASH (rx)      : ORIGIN = 0x8000000,  LENGTH = 128K
    RAM (xrw)       : ORIGIN = 0x20000000, LENGTH = 20K
}

/* Some custom variables (addresses) I intend to access from my C source code */
__flash_start__ = ORIGIN(FLASH);
__flash_end__ = ORIGIN(FLASH) + LENGTH(FLASH);
__ram_start__ = ORIGIN(RAM);
__ram_end__ = ORIGIN(RAM) + LENGTH(RAM);

And in your C source code you do:

// 1. correct way A:
extern uint32_t __flash_start__;
printf("__flash_start__ addr = 0x%lX
", (uint32_t)&__flash_start__);

// OR 2. correct way B (my preferred approach):
extern uint32_t __flash_start__[]; // not a true array; [] is required to access linker script variables (addresses) as though they were normal variables
printf("__flash_start__ addr = 0x%lX
", (uint32_t)__flash_start__);

// OR 3. COMPLETELY WRONG WAY TO DO IT!
// - IS THIS UNDEFINED BEHAVIOR?
extern uint32_t __flash_start__;
printf("__flash_start__ addr = 0x%lX
", __flash_start__);

Sample printed output

(this is real output: it was actually compiled, run, and printed by an STM32 mcu):

  1. __flash_start__ addr = 0x8000000
  2. __flash_start__ addr = 0x8000000
  3. __flash_start__ addr = 0x20080000 <== NOTICE LIKE I SAID ABOVE: this one is completely wrong (even though it compiles and runs)! <== Update Mar. 2020: actually, see my answer, this is just fine and right too, it just does something different is all.


Update:

Response to @Eric Postpischil's 1st comment:

The C standard does not define anything at all about linker script symbols. Any specification of behavior is up to the GNU tools. That said, if a linker script symbol identifies a place in memory where some valid object is stored, I would expect accessing the value of that object to work, if it were accessed with its proper type. Supposing flash_start is normally accessible memory, and except for any requirements of your system about what is at flash_start, you could, in theory, put a uint32_t (using appropriate input to the linker) and then access it via flash_start.

Yes, but that's not my question. I'm not sure if you're picking up the subtlety of my question. Take a look at the examples I provide. It is true you can access this location just fine, but make sure you understand how you do so, and then my question will become apparent. Look especially at example 3 above, which is wrong even though to a C programmer it looks right. To read a uint32_t, for ex, at __flash_start__, you'd do this:

extern uint32_t __flash_start__;
uint32_t u32 = *((uint32_t *)&__flash_start__); // correct, even though it *looks like* you're taking the address (&) of an address (__flash_start__)

OR this:

extern uint32_t __flash_start__[];
uint32_t u32 = *((uint32_t *)__flash_start__); // also correct, and my preferred way of doing it because it looks more correct to the trained "C-programmer" eye

But most definitely NOT this:

extern uint32_t __flash_start__;
uint32_t u32 = __flash_start__; // incorrect; <==UPDATE: THIS IS ALSO CORRECT! (and more straight-forward too, actually; see comment discussion under this question)

and NOT this:

extern uint32_t __flash_start__;
uint32_t u32 = *((uint32_t *)__flash_start__); // incorrect, but *looks* right

Related:

  1. Why do STM32 gcc linker scripts automatically discard all input sections from these standard libraries: libc.a, libm.a, libgcc.a?
  2. [My answer] How to get value of variable defined in ld linker script from C

解决方案

Shorter answer:

Accessing the "value" of a linker script variable is NOT undefined behavior, and is fine to do, so long as you want the actual data stored at that location in memory and not the address of that memory or the "value" of a linkerscript variable which happens to be seen by C code as an address in memory only and not a value.

Yeah, that's kind of confusing, so re-read that 3 times carefully. Essentially, if you want to access the value of a linker script variable just ensure your linker script is set up to prevent anything you don't want from ending up in that memory address so that whatever you DO want there is in fact there. This way, reading the value at that memory address will provide you something useful you expect to be there.

BUT, if you're using linker script variables to store some sort of "values" in and of themselves, the way to grab the "values" of these linker script variables in C is to read their addresses, because the "value" you assign to a variable in a linker script IS SEEN BY THE C COMPILER AS THE "ADDRESS" of that linker script variable, since linker scripts are designed to manipulate memory and memory addresses, NOT traditional C variables.

Here's some really valuable and correct comments under my question which I think are worth posting in this answer so they never get lost. Please go upvote his comments under my question above.

The C standard does not define anything at all about linker script symbols. Any specification of behavior is up to the GNU tools. That said, if a linker script symbol identifies a place in memory where some valid object is stored, I would expect accessing the value of that object to work, if it were accessed with its proper type. Supposing __flash_start__ is normally accessible memory, and except for any requirements of your system about what is at __flash_start__, you could, in theory, put a uint32_t (using appropriate input to the linker) and then access it via __flash_start__.
– Eric Postpischil

That documentation is not written very well, and you are taking the first sentence too literally. What is really happening here is that the linker’s notion of the "value" of a symbol and a programming language’s notion of the "value" of an identifier are different things. To the linker, the value of a symbol is simply a number associated with it. In a programming language, the value is a number (or other element in the set of values of some type) stored in the (sometimes notional) storage associated with the identifier. The documentation is advising you that the linker’s value of a symbol appears inside a language like C as the address associated with the identifier, rather than the contents of its storage...

THIS PART IS REALLY IMPORTANT and we should get the GNU linker script manual updated:

It goes too far when it tells you to "never attempt to use its value."

It is correct that merely defining a linker symbol does not reserve the necessary storage for a programming language object, and therefore merely having a linker symbol does not provide you storage you can access. However if you ensure storage is allocated by some other means, then, sure, it can work as a programming language object. There is no general prohibition on using a linker symbol as an identifier in C, including accessing its C value, if you have properly allocated storage and otherwise satisfied the requirements for this. If the linker value of __flash_start__ is a valid memory address, and you have ensure there is storage for a uint32_t at that address, and it is a properly aligned address for a uint32_t, then it is okay to access __flash_start__ in C as if it were a uint32_t. That would not be defined by the C standard, but by the GNU tools.
– Eric Postpischil

Long answer:

I said in the question:

// 1. correct way A:
extern uint32_t __flash_start__;
printf("__flash_start__ addr = 0x%lX
", (uint32_t)&__flash_start__);

// OR 2. correct way B (my preferred approach):
extern uint32_t __flash_start__[]; // not a true array; [] is required to access linker script variables (addresses) as though they were normal variables
printf("__flash_start__ addr = 0x%lX
", (uint32_t)__flash_start__);

// OR 3. COMPLETELY WRONG WAY TO DO IT!
// - IS THIS UNDEFINED BEHAVIOR?
extern uint32_t __flash_start__;
printf("__flash_start__ addr = 0x%lX
", __flash_start__);

(See discussion under the question for how I came to this).

Looking specifically at #3 above:

Well, actually, if your goal is to read the address of __flash_start__, which is 0x8000000 in this case, then yes, this is completely wrong. But, it is NOT undefined behavior! What it is actually doing, instead, is reading the contents (value) of that address (0x8000000) as a uint32_t type. In other words, it's simply reading the first 4 bytes of the FLASH section, and interpreting them as a uint32_t. The contents (uint32_t value at this address) just so happen to be 0x20080000 in this case.

To further prove this point, the following are exactly identical:

// Read the actual *contents* of the `__flash_start__` address as a 4-byte value!

// forward declaration to make a variable defined in the linker script
// accessible in the C code
extern uint32_t __flash_start__; 

// These 2 read techniques do the exact same thing.
uint32_t u32_1 = __flash_start__;                 // technique 1
uint32_t u32_2 = *((uint32_t *)&__flash_start__); // technique 2
printf("u32_1 = 0x%lX
", u32_1);
printf("u32_2 = 0x%lX
", u32_2);

The output is:

u32_1 = 0x20080000
u32_2 = 0x20080000

Notice they produce the same result. They each are producing a valid uint32_t-type value which is stored at address 0x8000000.

It just so turns out, however, that the u32_1 technique shown above is a more straight-forward and direct way of reading the value is all, and again, is not undefined behavior. Rather, it is correctly reading the value (contents of) that address.

I seem to be talking in circles. Anyway, mind blown, but I get it now. I was convinced before I was supposed to use the u32_2 technique shown above only, but it turns out they are both just fine, and again, the u32_1 technique is clearly more straight-forward (there I go talking in circles again). :)

Cheers.


Digging deeper: Where did the 0x20080000 value stored right at the start of my FLASH memory come from?

One more little tidbit. I actually ran this test code on an STM32F777 mcu, which has 512KiB of RAM. Since RAM starts at address 0x20000000, this means that 0x20000000 + 512K = 0x20080000. This just so happens to also be the contents of the RAM at address zero because Programming Manual PM0253 Rev 4, pg. 42, "Figure 10. Vector table" shows that the first 4 bytes of the Vector Table contain the "Initial SP [Stack Pointer] value". See here:

I know that the Vector Table sits right at the start of the program memory, which is located in Flash, so that means that 0x20080000 is my initial stack pointer value. This makes sense, because the Reset_Handler is the start of the program (and its vector just so happens to be the 2nd 4-byte value at the start of the Vector Table, by the way), and the first thing it does, as shown in my "startup_stm32f777xx.s" startup assembly file, is set the stack pointer (sp) to _estack:

Reset_Handler:  
  ldr   sp, =_estack      /* set stack pointer */

Furthermore, _estack is defined in my linker script as follows:

/* Highest address of the user mode stack */
_estack = ORIGIN(RAM) + LENGTH(RAM);    /* end of RAM */

So there you have it! The first 4-byte value in my Vector Table, right at the start of Flash, is set to be the initial stack pointer value, which is defined as _estack right in my linker script file, and _estack is the address at the end of my RAM, which is 0x20000000 + 512K = 0x20080000. So, it all makes sense! I've just proven I read the right value!

See also:

  1. [my answer] How to get value of variable defined in ld linker script from C

这篇关于正在访问“值";C 中的链接描述文件变量未定义行为?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆