在更改数据时UB的说明 [英] Explanation of the UB while changing data

查看:192
本文介绍了在更改数据时UB的说明的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图向一个工作实例证明,如果真的想(并知道如何)使用一些技巧,我可以改变一个常量限定的变量的值,在我的demostration,我发现存在两个flavors的常量值:你不能改变任何你做的,和那些你可以通过使用肮脏的技巧更改。



常量值是不可改变的编译器使用字面值而不是存储在堆栈中的值(已在此处读取)这里是一个一段代码,显示了我的意思:

  // TEST 1 
#define LOG(index,cv,ncv)std :: cout \
< std :: dec<索引<< .- Address =\
<< std :: hex<< & cv< \tValue =<< cv < '\\\
'\
<< std :: dec<索引<< .- Address =\
<< std :: hex<< & ncv< \tValue =<< ncv < '\\\
'

const unsigned int const_value = 0xcafe01e;

//尝试使用no-const引用
unsigned int& no_const_ref = const_cast< unsigned int&>(const_value);
no_const_ref = 0xfabada;
LOG(1,const_value,no_const_ref);

//尝试使用no-const指针
unsigned int * no_const_ptr = const_cast< unsigned int *>(& const_value);
* no_const_ptr = 0xb0bada;
LOG(2,const_value,(* no_const_ptr));

//尝试使用c风格的转换
no_const_ptr =(unsigned int *)& const_value;
* no_const_ptr = 0xdeda1;
LOG(3,const_value,(* no_const_ptr));

//尝试memcpy
unsigned int brute_force = 0xba51c;
std :: memcpy(no_const_ptr,& brute_force,sizeof(const_value));
LOG(4,const_value,(* no_const_ptr));

//尝试与union
联合bad_idea
{
const unsigned int * const_ptr;
unsigned int * no_const_ptr;
} u;

u.const_ptr =& const_value;
* u.no_const_ptr = 0xbeb1da;
LOG(5,const_value,(* u.no_const_ptr));

这将产生以下输出:

  1.-地址= 0xbfffbe2c值= cafe01e 
1.-地址= 0xbfffbe2c值= fabada
2.-地址= 0xbfffbe2c值= cafe01e
- 地址= 0xbfffbe2c值= b0bada
3.-地址= 0xbfffbe2c值= cafe01e
3.-地址= 0xbfffbe2c值= deda1
4.-地址= 0xbfffbe2c值= cafe01e
4.- Address = 0xbfffbe2c Value = ba51c
5.- Address = 0xbfffbe2c Value = cafe01e
5.- Address = 0xbfffbe2c Value = beb1da

由于我依赖于 UB (更改const数据的值)期望程序行为怪异;但这个奇怪超过了我的期望。



让我们假设编译器使用文字值,然后,当代码达到指令改变常量(通过引用,指针或 memcpy ing),只要忽略该顺序,只要值是字面值(是未定义的行为)。这就解释了为什么值保持不变,但是:




  • 为什么两个变量中的内存地址相同,


AFAIK相同的内存地址不能指向不同的值,因此,其中一个输出是:




  • 真正发生了什么?哪个内存地址是假的(如果有)?



对上面的代码做一些修改,我们可以尽量避免使用的文字价值,所以骗子会做它的工作(来源这里):



//测试2
//尝试使用no-const引用
void change_with_no_const_ref(const unsigned int& const_value)
{
unsigned int& no_const_ref = const_cast< unsigned int&>(const_value);
no_const_ref = 0xfabada;
LOG(1,const_value,no_const_ref);
}

//使用no-const指针
void change_with_no_const_ptr(const unsigned int& const_value)
{
unsigned int * no_const_ptr = const_cast< ; unsigned int *>(& const_value);
* no_const_ptr = 0xb0bada;
LOG(2,const_value,(* no_const_ptr));
}

//尝试使用c风格的转换
void change_with_cstyle_cast(const unsigned int& const_value)
{
unsigned int * no_const_ptr = unsigned int *)& const_value;
* no_const_ptr = 0xdeda1;
LOG(3,const_value,(* no_const_ptr));
}

//尝试使用memcpy
void change_with_memcpy(const unsigned int& const_value)
{
unsigned int * no_const_ptr = const_cast< unsigned int *>(& const_value);
unsigned int brute_force = 0xba51c;
std :: memcpy(no_const_ptr,& brute_force,sizeof(const_value));
LOG(4,const_value,(* no_const_ptr));
}

void change_with_union(const unsigned int& const_value)
{
//尝试用union
联合bad_idea
{
const unsigned int * const_ptr;
unsigned int * no_const_ptr;
} u;

u.const_ptr =& const_value;
* u.no_const_ptr = 0xbeb1da;
LOG(5,const_value,(* u.no_const_ptr));
}

int main(int argc,char ** argv)
{
unsigned int value = 0xcafe01e;
change_with_no_const_ref(value);
change_with_no_const_ptr(value)
change_with_cstyle_cast(value);
change_with_memcpy(value);
change_with_union(value);

return 0;
}

产生以下输出:

  1.- Address = 0xbff0f5dc Value = fabada 
1.- Address = 0xbff0f5dc Value = fabada
2.- Address = 0xbff0f5dc Value = b0bada
2.-地址= 0xbff0f5dc值= b0bada
3.-地址= 0xbff0f5dc值= deda1
3.-地址= 0xbff0f5dc值= deda1
4.-地址= 0xbff0f5dc值= ba51c
4.-地址= 0xbff0f5dc值= ba51c
5.-地址= 0xbff0f5dc值= beb1da
5.-地址= 0xbff0f5dc值= beb1da

我们可以看到,const限定的变量在每个 change_with _ * 并且除了这个事实之外,行为是一样的,所以我试图假设当const数据用作文字而不是值时,内存地址的怪异行为表现出来。



所以,为了确保这个假设,我做了一个最后一个测试,改变 unsigned int value main const unsigned int value

  TEST 3 
const unsigned int value = 0xcafe01e;
change_with_no_const_ref(value);
change_with_no_const_ptr(value);
change_with_cstyle_cast(value);
change_with_memcpy(value);
change_with_union(value);令人惊讶的是,输出与 TEST 2



>(此处的代码),因此我认为数据作为变量而不是文字值传递,由于其用法参数,所以这让我想知道:




  • 什么使编译器决定将const值优化为文字值?





简单来说,我的问题是:




  • TEST 1 中。

    • 为什么const值和no-const值共享相同的内存地址,但其包含的值不同?

    • 程序产生这个输出?

    • >

      • 什么让编译器决定将const值优化为文字值?



    解决方案

    一般来说,分析未定义行为是没有意义的,因为不能保证您可以将分析到不同的程序。



    在这种情况下,行为可以解释为假设编译器已经应用了优化技术称为常量传播。在该技术中,如果使用编译器知道该值的 const 变量的值,则编译器将替换 const 变量与该变量的值(因为它在编译时是已知的)。



    此优化是有效的,正是因为更改定义为 const 会导致未定义行为,编译器允许假设程序不调用未定义的行为。



    因此,在 TEST 1 中,地址是相同的,因为它们都是相同的变量,但是值不同,因为每对的第一个反映了编译器假定的内容)为变量的值,第二个反映实际存储在那里的内容。
    TEST 2 TEST 3 ,编译器无法进行优化,不能100%确保函数参数将引用常量值( TEST 2 ,它不会)。


    I was trying to demonstrate to a work pal that you can change the value of a constant-qualified variable if really wants to (and knows how to) by using some trickery, during my demostration, I've discovered that exists two "flavours" of constant values: the ones that you cannot change whatever you do, and the ones that you can change by using dirty tricks.

    A constant value is unchangeable when the compiler uses the literal value instead of the value stored on the stack (readed here), here is a piece of code that shows what I mean:

    // TEST 1
    #define LOG(index, cv, ncv) std::cout \
        << std::dec << index << ".- Address = " \
        << std::hex << &cv << "\tValue = " << cv << '\n' \
        << std::dec << index << ".- Address = " \
        << std::hex << &ncv << "\tValue = " << ncv << '\n'
    
    const unsigned int const_value = 0xcafe01e;
    
    // Try with no-const reference
    unsigned int &no_const_ref = const_cast<unsigned int &>(const_value);
    no_const_ref = 0xfabada;
    LOG(1, const_value, no_const_ref);
    
    // Try with no-const pointer
    unsigned int *no_const_ptr = const_cast<unsigned int *>(&const_value);
    *no_const_ptr = 0xb0bada;
    LOG(2, const_value, (*no_const_ptr));
    
    // Try with c-style cast
    no_const_ptr = (unsigned int *)&const_value;
    *no_const_ptr = 0xdeda1;
    LOG(3, const_value, (*no_const_ptr));
    
    // Try with memcpy
    unsigned int brute_force = 0xba51c;
    std::memcpy(no_const_ptr, &brute_force, sizeof(const_value));
    LOG(4, const_value, (*no_const_ptr));
    
    // Try with union
    union bad_idea
    {
        const unsigned int *const_ptr;
        unsigned int *no_const_ptr;
    } u;
    
    u.const_ptr = &const_value;
    *u.no_const_ptr = 0xbeb1da;
    LOG(5, const_value, (*u.no_const_ptr));
    

    This produces the following output:

    1.- Address = 0xbfffbe2c    Value = cafe01e
    1.- Address = 0xbfffbe2c    Value = fabada
    2.- Address = 0xbfffbe2c    Value = cafe01e
    2.- Address = 0xbfffbe2c    Value = b0bada
    3.- Address = 0xbfffbe2c    Value = cafe01e
    3.- Address = 0xbfffbe2c    Value = deda1
    4.- Address = 0xbfffbe2c    Value = cafe01e
    4.- Address = 0xbfffbe2c    Value = ba51c
    5.- Address = 0xbfffbe2c    Value = cafe01e
    5.- Address = 0xbfffbe2c    Value = beb1da
    

    Since I'm relying in a UB (change the value of const data) is expected that the program acts weird; but this weirdness is more than I was expecting.

    Let's supose that the compiler is using the literal value, then, when the code reach the instruction to change the value of the constant (by reference, pointer or memcpying), simply ignores the order as long as the value is a literal (is undefined behaviour though). This explains why the value remains unchanged but:

    • Why is the same memory address in both variables but the contained value differs?

    AFAIK the same memory address cannot point to different values, so, one of the outputs is lying:

    • What's really happening? Which memory address is the fake one (if any)?

    Making a few changes on the code above we can try to avoid the use of the literal value, so the trickery would do its work (source here):

    // TEST 2
    // Try with no-const reference
    void change_with_no_const_ref(const unsigned int &const_value)
    {
        unsigned int &no_const_ref = const_cast<unsigned int &>(const_value);
        no_const_ref = 0xfabada;
        LOG(1, const_value, no_const_ref);    
    }
    
    // Try with no-const pointer
    void change_with_no_const_ptr(const unsigned int &const_value)
    {
        unsigned int *no_const_ptr = const_cast<unsigned int *>(&const_value);
        *no_const_ptr = 0xb0bada;
        LOG(2, const_value, (*no_const_ptr));
    }
    
    // Try with c-style cast
    void change_with_cstyle_cast(const unsigned int &const_value)
    {
        unsigned int *no_const_ptr = (unsigned int *)&const_value;
        *no_const_ptr = 0xdeda1;
        LOG(3, const_value, (*no_const_ptr));
    }
    
    // Try with memcpy
    void change_with_memcpy(const unsigned int &const_value)
    {
        unsigned int *no_const_ptr = const_cast<unsigned int *>(&const_value);
        unsigned int brute_force = 0xba51c;
        std::memcpy(no_const_ptr, &brute_force, sizeof(const_value));
        LOG(4, const_value, (*no_const_ptr));
    }
    
    void change_with_union(const unsigned int &const_value)
    {
        // Try with union
        union bad_idea
        {
            const unsigned int *const_ptr;
            unsigned int *no_const_ptr;
        } u;
    
        u.const_ptr = &const_value;
        *u.no_const_ptr = 0xbeb1da;
        LOG(5, const_value, (*u.no_const_ptr));
    }
    
    int main(int argc, char **argv)
    {
        unsigned int value = 0xcafe01e;
        change_with_no_const_ref(value);
        change_with_no_const_ptr(value);
        change_with_cstyle_cast(value);
        change_with_memcpy(value);
        change_with_union(value);
    
        return 0;
    }
    

    Which produces the following output:

    1.- Address = 0xbff0f5dc    Value = fabada
    1.- Address = 0xbff0f5dc    Value = fabada
    2.- Address = 0xbff0f5dc    Value = b0bada
    2.- Address = 0xbff0f5dc    Value = b0bada
    3.- Address = 0xbff0f5dc    Value = deda1
    3.- Address = 0xbff0f5dc    Value = deda1
    4.- Address = 0xbff0f5dc    Value = ba51c
    4.- Address = 0xbff0f5dc    Value = ba51c
    5.- Address = 0xbff0f5dc    Value = beb1da
    5.- Address = 0xbff0f5dc    Value = beb1da
    

    As we can see, the const-qualified variable was changed on each change_with_* call, and the behaviour is the same as before except for this fact, so I was tempted to assume that the weird behaviour of the memory address manifests when the const data is used as literal instead of value.

    So, in order to ensure this assumption, I've made a last test, changing the unsigned int value in main to const unsigned int value:

    // TEST 3
    const unsigned int value = 0xcafe01e;
    change_with_no_const_ref(value);
    change_with_no_const_ptr(value);
    change_with_cstyle_cast(value);
    change_with_memcpy(value);
    change_with_union(value);
    

    Surprisingly the output is the same as TEST 2 (code here), so I suppose that the data is passed as variable not as literal value due to its usage as parameter, so this makes me wonder:

    • What things make the compiler to decide to optimize a const value as literal value?

    In brief, my questions are:

    • In TEST 1.
      • Why the const value and the no-const value shares the same memory address but its contained value differs?
      • What steps follows the program to produce this output? Which memory address is the fake one (if any)?
    • In TEST 3
      • What things make the compiler to decide to optimize a const value as literal value?

    解决方案

    In general, it is pointless to analyse Undefined Behaviour, because there is no guarantee that you can transfer the results of your analysis to a different program.

    In this case, the behaviour can be explained by assuming the compiler has applied the optimisation technique called constant propagation. In that technique, if you use the value of a const variable for which the compiler knows the value, then the compiler replaces the use of the const variable with the value of that variable (as it is known at compile time). Other uses of the variable, such as taking its address, are not replaced.

    This optimisation is valid, precisely because changing a variable that was defined as const results in Undefined Behaviour and the compiler is allowed to assume a program does not invoke undefined behaviour.

    So, in TEST 1, the addresses are the same, because it is all the same variable, but the values differ because the first of each pair reflects what the compiler presumes (rightly) to be the value of the variable and the second reflects what is actually stored there. In TEST 2 and TEST 3, the compiler can't make the optimisation, because the compiler can't be 100% sure that the function argument will refer to a constant value (and in TEST 2, it doesn't).

    这篇关于在更改数据时UB的说明的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆