如何解除引用C中的NULL指针不会崩溃程序? [英] How can dereferencing a NULL pointer in C not crash a program?

查看:1263
本文介绍了如何解除引用C中的NULL指针不会崩溃程序?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要一个真正的C大师分析我的代码崩溃的帮助。不是固定崩溃;我可以很容易地解决它,但在这之前,我想了解这是如何崩溃是可能的,因为它似乎完全不可能我。

I need help of a real C guru to analyze a crash in my code. Not for fixing the crash; I can easily fix it, but before doing so I'd like to understand how this crash is even possible, as it seems totally impossible to me.

这种崩溃只发生在客户机器上,我不能在本地重现它(所以我不能使用调试器步进代码),因为我不能获得此用户的数据库的副本。我的公司也不会允许我只改变代码中的几行,并为这个客户做一个自定义构建(所以我不能添加一些printf行,并让他再次运行代码),当然客户有一个构建没有调试符号。换句话说,我的剽窃能力非常有限。尽管如此,我可以打败崩溃,得到一些调试信息。然而,当我看看这些信息,然后在代码,我不能理解程序流如何可以达到有问题的行。代码应该已经崩溃了很久才到达那行。我完全迷失在这里。

This crash only happens on a customer machine and I cannot reproduce it locally (so I cannot step through the code using a debugger), as I cannot obtain a copy of this user's database. My company also won't allow me to just change a few lines in the code and make a custom build for this customer (so I cannot add some printf lines and have him run the code again) and of course the customer has a build without debug symbols. In other words, my debbuging abilities are very limited. Nonetheless I could nail down the crash and get some debugging information. However when I look at that information and then at the code I cannot understand how the program flow could ever reach the line in question. The code should have crashed long before getting to that line. I'm totally lost here.

让我们从相关的代码开始。它是很少的代码:

Let's start with the relevant code. It's very little code:

// ... code above skipped, not relevant ...

if (data == NULL) return -1;

information = parseData(data);

if (information == NULL) return -1;

/* Check if name has been correctly \0 terminated */
if (information->kind.name->data[information->kind.name->length] != '\0') {
    freeParsedData(information);
    return -1;
}

/* Copy the name */
realLength = information->kind.name->length + 1;
*result = malloc(realLength);
if (*result == NULL) {
    freeParsedData(information);
    return -1;
}
strlcpy(*result, (char *)information->kind.name->data, realLength);

// ... code below skipped, not relevant ...

这已经是。它崩溃在strlcpy。我可以告诉你,即使在运行时如何真正地调用strlcpy。 strlcpy实际上是使用以下参数调用的:

That's already it. It crashes in strlcpy. I can tell you even how strlcpy is really called at runtime. strlcpy is actually called with the following paramaters:

strlcpy ( 0x341000, 0x0, 0x1 );

知道这是很明显为什么strlcpy崩溃。它试图从一个NULL指针读取一个字符,这当然会崩溃。并且由于最后一个参数的值为1,原始长度必须为0.我的代码在这里明显有一个错误,它无法检查名称数据为NULL。我可以解决这个问题,没有问题。

Knowing this it is rather obvious why strlcpy crashes. It tries to read one character from a NULL pointer and that will of course crash. And since the last parameter has a value of 1, the original length must have been 0. My code clearly has a bug here, it fails to check for the name data being NULL. I can fix this, no problem.

我的问题是:

这个代码是如何得到strlcpy的? >
为什么此代码不会在if语句中崩溃?

我在本机上尝试过:

int main (
    int argc,
    char ** argv
) {
    char * nullString = malloc(10);
    free(nullString);
    nullString = NULL;

    if (nullString[0] != '\0') {
        printf("Not terminated\n");
        exit(1);
    }
    printf("Can get past the if-clause\n");

    char xxx[10];
    strlcpy(xxx, nullString, 1);
    return 0;   
}

此代码永远不会传递if语句。

This code never gets passed the if statement. It crashes in the if statement and that is definitely expected.

所以任何人都可以想到为什么第一个代码可以传递if-statement没有崩溃,如果name- >数据真的NULL吗?这对我来说是完全神秘的。

So can anyone think of any reason why the first code can get passed that if-statement without crashing if name->data is really NULL? This is totally mysterious to me. It doesn't seem deterministic.

重要的额外信息:

两个注释之间的代码实际上完成没有被遗漏。此外,应用程序为单线程,因此没有其他线程可能会意外更改后台内存。发生这种情况的平台是PPC CPU(G4,在可以发挥任何作用的情况下)。如果有人想知道kind,这是因为信息包含一个名为kind的union,name又是一个结构体(kind是一个union,每个可能的union值是一个不同类型的struct)。但这一切都不应该在这里。

Important extra information:
The code between the two comments is really complete, nothing has been left out. Further the application is single threaded, so there is no other thread that could unexpectedly alter any memory in the background. The platform where this happens is a PPC CPU (a G4, in case that could play any role). And in case someone wonders about "kind.", this is because "information" contains a "union" named "kind" and name is a struct again (kind is a union, every possible union value is a different type of struct); but this all shouldn't really matter here.

我很感谢这里的任何想法。我更加感激,如果它不只是一个理论,但如果有一种方法,我可以验证这个理论真正为客户真正的。

I'm grateful for any idea here. I'm even more grateful if it's not just a theory, but if there is a way I can verify that this theory really holds true for the customer.

我已经接受了正确的答案,但是为了防止任何人在Google上发现这个问题,以下是真正发生的事情:

I accepted the right answer already, but just in case anyone finds this question on Google, here's what really happened:

指针指向已经释放的内存。释放内存不会使其全部为零,也不会导致进程立即将其返回到系统。因此,即使内存已被错误地释放,它包含正确的值。

The pointers were pointing to memory, that has already been freed. Freeing memory won't make it all zero or cause the process to give it back to the system at once. So even though the memory has been erroneously freed, it was containing the correct values. The pointer in question is not NULL at the time the "if check" is performed.

在检查后,我分配一些新的内存,调用malloc。不知道什么确切malloc在这里,但每次调用malloc或free可以对进程的虚拟地址空间的所有动态内存具有深远的影响。在malloc调用之后,指针实际上是NULL。不知何故malloc(或一些系统调用malloc使用)零已经释放的内存指针本身(不是它指向的数据,指针本身在动态内存中)。归零的内存,指针现在有一个值0x0,在我的系统上等于NULL,当strlcpy被调用时,它当然会崩溃。

After that check I allocate some new memory, calling malloc. Not sure what exactly malloc does here, but every call to malloc or free can have far-reaching consequences to all dynamic memory of the virtual address space of a process. After the malloc call, the pointer is in fact NULL. Somehow malloc (or some system call malloc uses) zeros the already freed memory where the pointer itself is located (not the data it points to, the pointer itself is in dynamic memory). Zeroing that memory, the pointer now has a value of 0x0, which is equal to NULL on my system and when strlcpy is called, it will of course crash.

真正的bug导致这种奇怪的行为是在一个完全不同的位置在我的代码。不要忘记:释放的内存保持它的价值,但它是无法控制的多久。要检查您的应用程序是否具有访问已释放内存的内存错误,只需确保释放的内存在释放之前始终为零。在OS X中,您可以通过在运行时设置一个环境变量来实现(不需要重新编译任何东西)。当然,这会减慢程序的速度,但是你会早点捕获这些错误。

So the real bug causing this strange behavior was at a completely different location in my code. Never forget: Freed memory keeps it values, but it is beyond your control for how long. To check if your app has a memory bug of accessing already freed memory, just make sure the freed memory is always zeroed before it is freed. In OS X you can do this by setting an environment variable at runtime (no need to recompile anything). Of course this slows down the program quite a bit, but you will catch those bugs much earlier.

推荐答案

结构位于已经 free()'的内存中,或者堆已损坏。在这种情况下, malloc()可以修改内存,认为它是免费的。

It is possible that the structure is located in memory that has been free()'d, or the heap is corrupted. In that case, malloc() could be modifying the memory, thinking that it is free.

在内存检查程序下运行程序。支持Mac OS X的一个内存检查程序是 valgrind ,虽然它仅在Intel上而不是在PowerPC上支持Mac OS X.

You might try running your program under a memory checker. One memory checker that supports Mac OS X is valgrind, although it supports Mac OS X only on Intel, not on PowerPC.

这篇关于如何解除引用C中的NULL指针不会崩溃程序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆