什么是用C用memset()的优势 [英] What is the advantage of using memset() in C

查看:159
本文介绍了什么是用C用memset()的优势的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我很好奇是否有任何优势的问候效率以类似下图的情况下利用memset的()。

由于以下缓冲声明...

 结构More_Buffer_Info
{
    unsigned char型一[10];
    unsigned char型B〔10];
    unsigned char型C [10];
};结构My_Buffer_Type
{
    结构More_Buffer_Info buffer_info [100];
};结构My_Buffer_Type my_buffer [5];无符号字符* P;
P =(无符号字符*)my_buffer;

除了具有code少线,有一个优势,使用此:

  memset的((无效*)P,0,sizeof的(my_buffer));

在这样的:

 为(i = 0; I<的sizeof(my_buffer);我++)
{
    * P ++ = 0;
}


解决方案

这适用于 memset的()的memcpy()


  1. 减code:正如你已经提到过,它的短 - code少行

  2. 更具可读性:通常较短使得它更具可读性为好。 ( memset的()比循环更易读)

  3. 它可以更快:它有时让更多的侵略性编译器优化。 (所以它可能会更快)

  4. 不对:在某些情况下,当你处理一个处理器上未对齐数据不支持未对齐访问, memset的()的memcpy()可能是唯一干净的解决方案。

要扩大第三点, memset的()可以在很大程度上通过SIMD和这样的编译器优化。如果你写一个循环,而不是,编译器将首先需要搞清楚什么它才可以尝试优化它。

这里的基本思路是, memset的()和类似的库函数,在一定意义上,讲述编译你的意图。


由于在评论中提到@Oli,也有一些缺点。我在这里对他们展开:


  1. 您需要确保 memset的()其实你想要做什么。该标准不说,对于不同的数据类型零必然是零在内存中。

  2. 对于非零数据, memset的()限制为仅1个字节的内容。所以你不能用 memset的()如果你想设置的数组 INT s至零以外的东西(或 0x01010101 或东西...)。

  3. 虽然罕见,但也有一些角落情况下,如果它实际上是有可能击败在用自己的回路性能的编译器。*

*我会从我的经验给这样一个例子:

虽然 memset的()的memcpy()编译器是具有特殊的处理通常编译器内在函数,它们是仍然通用的功能。他们说一无所知的数据类型包括数据的对齐方式。

因此​​,在几个(abeit罕见)的情况下,编译器是无法确定的存储器区域的取向,并且因此必须产生额外code键处理错位。然而,如果你的程序员,是100%肯定比对,使用循环实际上可能会更快。

一个常见的​​例子使用SSE / AVX内部函数时是。 (如复制浮动 s的16位/ 32字节对​​齐的数组)如果编译器不能确定16位/ 32字节对​​齐,它需要使用错位负载/存储和/或处理code。如果你只是编写使用SSE / AVX对齐加载/存储内在的循环,你可以的可能的做的更好。

 浮动* PTRA = ... //一些不知名的来源,保证是32字节对​​齐
浮动*的ptrB = ... //一些不知名的来源,保证是32字节对​​齐
INT长度= ... //一些不知名的来源,保证是多重的8//存储器复制() - 编译器无法读取评论。它不知道该数据是32字节
//对齐。因此,它可能会产生不必要的偏差处理code。
的memcpy(PTRA,的ptrB,长*的sizeof(浮动));//这个循环可能会更快,因为它使用的事实,
//指针对齐。编译器还可以进一步优化这一点。
对于(INT C = 0;℃下的长度; C + = 8){
    _mm256_store_ps(PTRA + C,_mm256_load_ps(的ptrB + C));
}

I was curious as to whether or not there was any advantage in regards to efficiency to utilizing memset() in a situation similar to the one below.

Given the following buffer declarations...

struct More_Buffer_Info
{
    unsigned char a[10];
    unsigned char b[10];
    unsigned char c[10];
};

struct My_Buffer_Type
{
    struct More_Buffer_Info buffer_info[100];
};

struct My_Buffer_Type my_buffer[5];

unsigned char *p;
p = (unsigned char *)my_buffer;

Besides having less lines of code, is there an advantage to using this:

memset((void *)p, 0, sizeof(my_buffer));

Over this:

for (i = 0; i < sizeof(my_buffer); i++)
{
    *p++ = 0;
}

解决方案

This applies to both memset() and memcpy():

  1. Less Code: As you have already mentioned, it's shorter - fewer lines of code.
  2. More Readable: Shorter usually makes it more readable as well. (memset() is more readable than that loop)
  3. It can be faster: It can sometimes allow more aggressive compiler optimizations. (so it may be faster)
  4. Misalignment: In some cases, when you're dealing with misaligned data on a processor that doesn't support misaligned accesses, memset() and memcpy() may be the only clean solution.

To expand on the 3rd point, memset() can be heavily optimized by the compiler using SIMD and such. If you write a loop instead, the compiler will first need to "figure out" what it does before it can attempt to optimize it.

The basic idea here is that memset() and similar library functions, in some sense, "tells" the compiler your intent.


As mentioned by @Oli in the comments, there are some downsides. I'll expand on them here:

  1. You need to make sure that memset() actually does what you want. The standard doesn't say that zeros for the various datatypes are necessarily zero in memory.
  2. For non-zero data, memset() is restricted to only 1 byte content. So you can't use memset() if you want to set an array of ints to something other than zero (or 0x01010101 or something...).
  3. Although rare, there are some corner cases, where it's actually possible to beat the compiler in performance with your own loop.*

*I'll give one example of this from my experience:

Although memset() and memcpy() are usually compiler intrinsics with special handling by the compiler, they are still generic functions. They say nothing about the datatype including the alignment of the data.

So in a few (abeit rare) cases, the compiler isn't able to determine the alignment of the memory region, and thus must produce extra code to handle misalignment. Whereas, if you the programmer, is 100% sure of alignment, using a loop might actually be faster.

A common example is when using SSE/AVX intrinsics. (such as copying a 16/32-byte aligned array of floats) If the compiler can't determine the 16/32-byte alignment, it will need to use misaligned load/stores and/or handling code. If you simply write a loop using SSE/AVX aligned load/store intrinsics, you can probably do better.

float *ptrA = ...  //  some unknown source, guaranteed to be 32-byte aligned
float *ptrB = ...  //  some unknown source, guaranteed to be 32-byte aligned
int length = ...   //  some unknown source, guaranteed to be multiple of 8

//  memcopy() - Compiler can't read comments. It doesn't know the data is 32-byte
//  aligned. So it may generate unnecessary misalignment handling code.
memcpy(ptrA, ptrB, length * sizeof(float));

//  This loop could potentially be faster because it "uses" the fact that
//  the pointers are aligned. The compiler can also further optimize this.
for (int c = 0; c < length; c += 8){
    _mm256_store_ps(ptrA + c, _mm256_load_ps(ptrB + c));
}

这篇关于什么是用C用memset()的优势的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆