在 C 中使用 memset() 有什么好处 [英] What is the advantage of using memset() in C

查看:33
本文介绍了在 C 中使用 memset() 有什么好处的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我很好奇在类似于以下情况的情况下使用 memset() 在效率方面是否有任何优势.

I was curious as to whether or not there was any advantage in regards to efficiency to utilizing memset() in a situation similar to the one below.

鉴于以下缓冲区声明...

Given the following buffer declarations...

struct More_Buffer_Info
{
    unsigned char a[10];
    unsigned char b[10];
    unsigned char c[10];
};

struct My_Buffer_Type
{
    struct More_Buffer_Info buffer_info[100];
};

struct My_Buffer_Type my_buffer[5];

unsigned char *p;
p = (unsigned char *)my_buffer;

除了代码行数更少之外,使用这个还有什么好处:

Besides having less lines of code, is there an advantage to using this:

memset((void *)p, 0, sizeof(my_buffer));

关于这个:

for (i = 0; i < sizeof(my_buffer); i++)
{
    *p++ = 0;
}

推荐答案

这适用于 memset()memcpy():

  1. 更少的代码:正如您已经提到的,它更短 - 更少的代码行数.
  2. 更具可读性:更短通常也使其更具可读性.(memset() 比那个循环更具可读性)
  3. 它可以更快:它有时可以允许更积极的编译器优化.(所以它可能会更快)
  4. 未对齐:在某些情况下,当您在不支持未对齐访问的处理器上处理未对齐数据时,memset()memcpy() 可能是唯一干净的解决方案.
  1. Less Code: As you have already mentioned, it's shorter - fewer lines of code.
  2. More Readable: Shorter usually makes it more readable as well. (memset() is more readable than that loop)
  3. It can be faster: It can sometimes allow more aggressive compiler optimizations. (so it may be faster)
  4. Misalignment: In some cases, when you're dealing with misaligned data on a processor that doesn't support misaligned accesses, memset() and memcpy() may be the only clean solution.

为了扩展第三点,memset() 可以由编译器使用 SIMD 等进行大量优化.如果您改为编写循环,编译器首先需要弄清楚"它的作用,然后才能尝试对其进行优化.

To expand on the 3rd point, memset() can be heavily optimized by the compiler using SIMD and such. If you write a loop instead, the compiler will first need to "figure out" what it does before it can attempt to optimize it.

这里的基本思想是 memset() 和类似的库函数,在某种意义上,告诉"编译器你的意图.

The basic idea here is that memset() and similar library functions, in some sense, "tells" the compiler your intent.

正如@Oli 在评论中提到的,有一些缺点.我将在这里扩展它们:

As mentioned by @Oli in the comments, there are some downsides. I'll expand on them here:

  1. 您需要确保 memset() 确实按照您的意愿行事.标准并没有说各种数据类型的零在内存中一定是零.
  2. 对于非零数据,memset() 仅限于 1 字节内容.因此,如果要将 int 数组设置为零以外的值(或 0x01010101 或其他值,则不能使用 memset()...).
  3. 虽然很少见,但在某些极端情况下,实际上有可能通过您自己的循环在性能上击败编译器.*
  1. You need to make sure that memset() actually does what you want. The standard doesn't say that zeros for the various datatypes are necessarily zero in memory.
  2. For non-zero data, memset() is restricted to only 1 byte content. So you can't use memset() if you want to set an array of ints to something other than zero (or 0x01010101 or something...).
  3. Although rare, there are some corner cases, where it's actually possible to beat the compiler in performance with your own loop.*

*我将根据我的经验举一个例子:

*I'll give one example of this from my experience:

虽然 memset()memcpy() 通常是编译器内部函数,由编译器进行特殊处理,但它们仍然是 generic 函数.他们只字不提数据类型,包括数据的对齐方式.

Although memset() and memcpy() are usually compiler intrinsics with special handling by the compiler, they are still generic functions. They say nothing about the datatype including the alignment of the data.

因此在少数(尽管很少见)情况下,编译器无法确定内存区域的对齐方式,因此必须生成额外的代码来处理未对齐情况.然而,如果你是程序员,100% 确定对齐,使用循环实际上可能更快.

So in a few (abeit rare) cases, the compiler isn't able to determine the alignment of the memory region, and thus must produce extra code to handle misalignment. Whereas, if you the programmer, is 100% sure of alignment, using a loop might actually be faster.

一个常见的例子是使用 SSE/AVX 内部函数时.(例如复制 floats 的 16/32 字节对齐数组)如果编译器无法确定 16/32 字节对齐,则需要使用未对齐的加载/存储和/或处理代码.如果您只是使用 SSE/AVX 对齐的加载/存储内在函数编写一个循环,您可以可能做得更好.

A common example is when using SSE/AVX intrinsics. (such as copying a 16/32-byte aligned array of floats) If the compiler can't determine the 16/32-byte alignment, it will need to use misaligned load/stores and/or handling code. If you simply write a loop using SSE/AVX aligned load/store intrinsics, you can probably do better.

float *ptrA = ...  //  some unknown source, guaranteed to be 32-byte aligned
float *ptrB = ...  //  some unknown source, guaranteed to be 32-byte aligned
int length = ...   //  some unknown source, guaranteed to be multiple of 8

//  memcopy() - Compiler can't read comments. It doesn't know the data is 32-byte
//  aligned. So it may generate unnecessary misalignment handling code.
memcpy(ptrA, ptrB, length * sizeof(float));

//  This loop could potentially be faster because it "uses" the fact that
//  the pointers are aligned. The compiler can also further optimize this.
for (int c = 0; c < length; c += 8){
    _mm256_store_ps(ptrA + c, _mm256_load_ps(ptrB + c));
}

这篇关于在 C 中使用 memset() 有什么好处的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆