为什么此版本的strcmp较慢? [英] Why is this version of strcmp slower?

查看:57
本文介绍了为什么此版本的strcmp较慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试在特定条件下改善 strcmp 的性能.但是,不幸的是,我什至无法获得像普通的 strcmp 那样的实现以及库的实现.

I have been trying experiment with improving performance of strcmp under certain conditions. However, I unfortunately cannot even get an implementation of plain vanilla strcmp to perform as well as the library implementation.

我看到了一个类似的问题,但答案表明,区别在于编译器优化了对字符串文字的比较.我的测试未使用字符串文字.

I saw a similar question, but the answers say the difference was from the compiler optimizing away the comparison on string literals. My test does not use string literals.

这是实现( comparisons.cpp )

int strcmp_custom(const char* a, const char* b) {
    while (*b == *a) {
        if (*a == '\0') return 0;
        a++;
        b++;
    }
    return *b - *a;
}

这是测试驱动程序( driver.cpp ):

And here's the test driver (driver.cpp):

#include "comparisons.h"

#include <array>
#include <chrono>
#include <iostream>

void init_string(char* str, int nChars) {
    // 10% of strings will be equal, and 90% of strings will have one char different.
    // This way, many strings will share long prefixes so strcmp has to exercise a bit.
    // Using random strings still shows the custom implementation as slower (just less so).
    str[nChars - 1] = '\0';
    for (int i = 0; i < nChars - 1; i++)
        str[i] = (i % 94) + 32;

    if (rand() % 10 != 0)
        str[rand() % (nChars - 1)] = 'x';
}

int main(int argc, char** argv) {
    srand(1234);

    // Pre-generate some strings to compare.
    const int kSampleSize = 100;
    std::array<char[1024], kSampleSize> strings;
    for (int i = 0; i < kSampleSize; i++)
        init_string(strings[i], kSampleSize);

    auto start = std::chrono::high_resolution_clock::now();

    for (int i = 0; i < kSampleSize; i++)
        for (int j = 0; j < kSampleSize; j++)
            strcmp(strings[i], strings[j]);

    auto end = std::chrono::high_resolution_clock::now();
    std::cout << "strcmp        - " << (end - start).count() << std::endl;

    start = std::chrono::high_resolution_clock::now();

    for (int i = 0; i < kSampleSize; i++)
        for (int j = 0; j < kSampleSize; j++)
            strcmp_custom(strings[i], strings[j]);

    end = std::chrono::high_resolution_clock::now();
    std::cout << "strcmp_custom - " << (end - start).count() << std::endl;
}

还有我的makefile:

And my makefile:

CC=clang++

test: driver.o comparisons.o
    $(CC) -o test driver.o comparisons.o

# Compile the test driver with optimizations off.
driver.o: driver.cpp comparisons.h
    $(CC) -c -o driver.o -std=c++11 -O0 driver.cpp

# Compile the code being tested separately with optimizations on.
comparisons.o: comparisons.cpp comparisons.h
    $(CC) -c -o comparisons.o -std=c++11 -O3 comparisons.cpp

clean:
    rm comparisons.o driver.o test

此答案的建议下,我在具有优化功能的单独编译单元中编译了比较函数,并编译了驱动程序关闭优化功能后,我的速度仍然降低了约5倍.

On the advice of this answer, I compiled my comparison function in a separate compilation unit with optimizations and compiled the driver with optimizations turned off, but I still get a slowdown of about 5x.

strcmp        - 154519
strcmp_custom - 506282

我还尝试复制 FreeBSD实现,但结果相似.

I also tried copying the FreeBSD implementation but got similar results.

我想知道我的绩效评估是否忽略了某些事情.还是标准库实现做得更好?

I'm wondering if my performance measurement is overlooking something. Or is the standard library implementation doing something fancier?

推荐答案

我不知道您拥有哪个标准库,只是想让您了解C库维护人员在优化字符串原语方面的认真程度,上使用的 default strcmp 是两千行经过手工优化的汇编语言.当SSSE3和SSE4.2指令集扩展可用时,有单独的,也经过手动优化的版本.(该文件中相当复杂的部分似乎是因为使用相同的源代码来生成其他一些功能;机器代码最终仅是"1120条"指令.)2.24大约在一年前发布的,甚至更多.从那以后工作就开始了.

I don't know which standard library you have, but just to give you an idea of how serious C library maintainers are about optimizing the string primitives, the default strcmp used by GNU libc on x86-64 is two thousand lines of hand-optimized assembly language, as of version 2.24. There are separate, also hand-optimized, versions for when the SSSE3 and SSE4.2 instruction set extensions are available. (A fair bit of the complexity in that file appears to be because the same source code is used to generate several other functions; the machine code winds up being "only" 1120 instructions.) 2.24 was released roughly a year ago, and even more work has gone into it since.

他们会遇到很多麻烦,因为在配置文件中,字符串原语之一通常是最热门的单个函数.

They go to this much trouble because it's common for one of the string primitives to be the single hottest function in a profile.

这篇关于为什么此版本的strcmp较慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆