C ++性能:检查一块内存在特定单元格中具有特定的值 [英] C++ performance: checking a block of memory for having specific values in specific cells

查看:82
本文介绍了C ++性能:检查一块内存在特定单元格中具有特定的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在做一个关于二维bin包装算法的研究。我曾询问关于PHP性能的类似问题,它的打包太慢了,现在代码是转换为C ++。

I'm doing a research on 2D Bin Packing algorithms. I've asked similar question regarding PHP's performance - it was too slow to pack - and now the code is converted to C++.

仍然很慢。我的程序所做的是分配动态内存块,并用字符'o'填充它们。

It's still pretty slow. What my program does is consequently allocating blocks of dynamic memory and populating them with a character 'o'

char* bin;
bin = new (nothrow) char[area];
if (bin == 0) {
    cout << "Error: " << area << " bytes could not be allocated";
    return false;
}
for (int i=0; i<area; i++) {
    bin[i]='o';
}

(对于我的数据集,它们的大小介于1kb和30kb之间)

(their size is between 1kb and 30kb for my datasets)

然后,程序检查当前内存块中x字符的不同组合。

Then the program checks different combinations of 'x' characters inside of current memory block.

void place(char* bin, int* best, int width)
{   
    for (int i=best[0]; i<best[0]+best[1]; i++)
        for (int j=best[2]; j<best[2]+best[3]; j++)
            bin[i*width+j] = 'x';
}

检查非重叠的函数中的一个调用百万次

One of the functions that checks the non-overlapping gets called millions of times during a runtime.

bool fits(char* bin, int* pos, int width)
{   
    for (int i=pos[0]; i<pos[0]+pos[1]; i++)
        for (int j=pos[2]; j<pos[2]+pos[3]; j++)
            if (bin[i*width+j] == 'x')
                return false;
    return true;
}

所有其他东西只占运行时的一部分,所以我需要这两个家伙(适合和地方)更快。

All other stuff takes only a percent of the runtime, so I need to make these two guys (fits and place) faster. Who's the culprit?

由于我只有两个选项'x'和'o',我可以尝试使用一个位而不是整个字节的char。但我更关心速度,你认为这会使事情更快吗?

Since I only have two options 'x' and 'o', I could try to use just one bit instead of the whole byte the char takes. But I'm more concerned with the speed, you think it would make the things faster?

谢谢!

更新:我用 rect pos 替换了 int * pos code>),正如MSalters建议的。起初我看到了改进,但我测试了更大的数据集,似乎回到正常的运行时。

Update: I replaced int* pos with rect pos (the same for best), as MSalters suggested. At first I saw improvement, but I tested more with bigger datasets and it seems to be back to normal runtimes. I'll try other techniques suggested and will keep you posted.

更新:使用 memset memchr 加速了两次。将x和o替换为\1和\0没有显示任何改进。 __ restrict 也没有帮助。总的来说,我对程序的性能感到满意,因为我也对算法本身做了一些改进。我还要尝试使用位图和编译-02(-03)...再次感谢大家。

Update: using memset and memchr sped up things about twice. Replacing 'x' and 'o' with '\1' and '\0' didn't show any improvement. __restrict wasn't helpful either. Overall, I'm satisfied with the performance of the program now since I also made some improvements to the algorithm itself. I'm yet to try using a bitmap and compiling with -02 (-03)... Thanks again everybody.

推荐答案

[当然:profile it!]

[ Of course: profile it!]

使用位而不是字节在第一个实例中不会更快。

Using a bit rather than a byte will not be faster in the first instance.

但是,考虑到使用字符,可以将4或8个字节的块转换为无符号的32位或64位整数(确保处理对齐),并比较到块中的oooo或oooooooo的值。这允许一个非常快速的比较。

However, consider that with characters, you can cast blocks of 4 or 8 bytes to unsigned 32 bit or 64 bit integers (making sure you handle alignment), and compare that to the value for 'oooo' or 'oooooooo' in the block. That allows a very fast compare.

现在已经走了整数方法,你可以看到,你可以做同样的位方法和处理说64位在单比较。这肯定会提高真正的速度。

Now having gone down the integer approach, you can see that you could do that same with the bit approach and handle say 64 bits in a single compare. That should surely give a real speed up.

这篇关于C ++性能:检查一块内存在特定单元格中具有特定的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆