正则表达式速度:在VS2013下,Python比C ++ 11快6倍吗? [英] Regex speed: Python x6 times faster than C++11 under VS2013?

查看:123
本文介绍了正则表达式速度:在VS2013下,Python比C ++ 11快6倍吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

可能是python的C regex实现快6倍还是我错过了什么?

Could it be that python's C regex implementation is 6 times faster or am I missing something ?

Python版本:

import re
r=re.compile(r'(HELLO).+?(\d+)', re.I)
s=r"prefixdfadfadf adf adf adf adf he asdf dHello Regex 123"

%timeit r.search(s)

1000000 loops, best of 3: 1.3 µs per loop (769,000 per sec)

C ++ 11版本:

#include<regex>
int main(int argc, char * argv[])
{
    std::string s = "prefixdfadfadf adf adf adf adf he asdf dHello Regex 123";
    std::regex my(R"((HELLO).+?(\d+))", regex_constants::icase);

    bench_utils::run(std::chrono::seconds(10),
        [&]{
        std::smatch match;
        bool found = std::regex_search(s, match, my);
    });       
    return 0;
}

Results in about ~125,000 searches/second

编辑:
这是bench_utils的代码:

Here is the code for bench_utils:

namespace bench_utils
{
    template<typename T>    
    inline std::string formatNum(const T& value)
    {
            static std::locale loc("");
            std::stringstream ss;
            ss.imbue(loc);
            ss << value;
            return ss.str();
        }

    inline void run(const std::chrono::milliseconds &duration,
        const std::function<void() >& fn)
    {
        using namespace std::chrono;
        typedef steady_clock the_clock;
        size_t counter = 0;
        seconds printInterval(1);
        auto startTime = the_clock::now();
        auto lastPrintTime = startTime;
        while (true)
        {
            fn();
            counter++;
            auto now = the_clock::now();
            if (now - startTime >= duration)
                break;
            auto p = now - lastPrintTime;
            if (now - lastPrintTime >= printInterval)
            {
                std::cout << formatNum<size_t>(counter) << " ops per second" << std::endl;
                counter = 0;
                lastPrintTime = the_clock::now();
            }
        }
    }

}


推荐答案

首先要注意的是,在Python中,正则表达式(无论使用 re 还是 regex 模块)以 c的速度发生,也就是说,实际的繁重代码是c很难实现的,因此至少对于更长的字符串,性能将取决于c regexp的实现。

The first thing to note is that in Python, regex (whether using the re, or regex module) occurs 'at the speed of c', that is the actual heavy lifting code is cold hard c, and thus at least for longer strings the performance is going to depend on the c regexp implementation.

有时候python非常聪明,python在每秒执行数千万次操作的附近没有问题,并且每秒可以创建数百万个对象-这是一个比c慢1000倍,但是如果我们谈论的事情以微秒开始,那么python开销可能并不重要,它只会使每个函数调用增加0.1微秒。

Sometimes python is pretty clever, python has no trouble performing in the vicinity of tens of millions of operations per second and it can create millions of objects per second - this is a thousand times slower than c, but if we're talking something that takes microseconds to begin with, the python overhead may not really matter, it will only add 0.1 microseconds to each function call.

因此,在这种情况下,Python的相对缓慢无关紧要。绝对而言,它足够快,重要的是正则表达式函数执行其操作的速度。

So in this case the relative slowness of Python doesn't matter. It's fast enough in absolute terms that what matters is how fast the regular expression functions do their thing.

我重写了c ++的情况,以免受到任何批评(我希望,请随时指出),实际上,它甚至不需要创建匹配对象,因为搜索只是返回布尔值(对/错):

I rewrote the c++ case to be not subject to any criticisms (I hope, feel free to point out any), in fact it doesn't even need to create a match object as search simply returns a bool (true/false):

#include <regex>
#include <iostream>

int main(int argc, char * argv[])
{
    std::string s = "prefixdfadfadf adf adf adf adf he asdf dHello Regex 123";
    std::regex my(R"((HELLO).+?(\d+))", std::regex_constants::icase);

    int matches = 0;
    for (int i = 0; i < 1000000; ++i)
        matches += std::regex_search(s, my);


    std::cout << matches  << std::endl;
    return 0;
}

我写了一个类似的python程序(尽管python确实创建并返回了一个match对象),而我的结果与您的结果完全相同

I wrote a comparable python program (although python did create and return a match object) and my results were exactly the same as yours


c++   : 6.661s
python: 1.039s

我认为这里的基本结论是,Python的正则表达式实现完全破坏了c ++标准库。

I think the basic conclusion here is that Python's regex implementation simply thrashes the c++ standard library one.

A只是为了好玩,我将Python的正则表达式性能与Go的正则表达式性能进行了比较。而且python的速度至少是它的两倍。

A while back just for fun I compared Python's regex performance with Go's regex performance. And python was at least twice as fast.

结论是python的regexp实现非常好,您当然不应该在Python之外看待以提高regexp性能。正则表达式的工作从根本上来说很耗时,以至于Python的开销在丝毫没有影响,Python的实现也很棒(新的 regex 模块通常甚至更快)比 re )。

The conclusion is that python's regexp implementation is very good and you should certainly not look outside Python to get improved regexp performance. The work regular expression do is fundamentally time consuming enough that Python's overhead doesn't really matter in the slightest and Python's got a great implementation (and the new regex module is often even faster than re).

这篇关于正则表达式速度:在VS2013下,Python比C ++ 11快6倍吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆