在字符串Python上进行计数操作的计算成本是多少? [英] What's the computational cost of count operation on strings Python?

查看:72
本文介绍了在字符串Python上进行计数操作的计算成本是多少?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

例如:

'hello'.count('e')

这是O(n)吗?我猜它的工作方式是它扫描'hello'并在每次看到字母'e'时增加一个计数器.我怎么能不知道就知道这一点?我尝试在此处中阅读源代码,但在发现时陷入了困境这个:

Is this O(n)? I'm guessing the way it works is it scans 'hello' and increments a counter each time the letter 'e' is seen. How can I know this without guessing? I tried reading the source code here, but got stuck upon finding this:

def count(s, *args):
    """count(s, sub[, start[,end]]) -> int

    Return the number of occurrences of substring sub in string
    s[start:end].  Optional arguments start and end are
    interpreted as in slice notation.

    """
    return s.count(*args)

在哪里可以了解 s.count(* args)中执行的内容?

Where can I read about what's executed in s.count(*args)?

我了解 * args 在Python函数上下文中的作用.

edit: I understand what *args does in the context of Python functions.

推荐答案

str.count fastsearch 来搜索字符串中子字符串的出现并计数.

str.count is implemented in native code, in the stringobject.c file, which delegates to either stringlib_count, or PyUnicode_Count which itself delegates to stringlib_count again. stringlib_count ultimately uses fastsearch to search for occurrences of the substring in the string and counting those.

对于一个字符的字符串(例如您的'e'),它会短路到以下代码路径:

For one-character strings (e.g. your 'e'), it is short-circuited to the following code path:

for (i = 0; i < n; i++)
    if (s[i] == p[0]) {
        count++;
        if (count == maxcount)
            return maxcount;
    }
return count;

是的,这正好是您假设对字符串序列进行简单迭代并计算子字符串的出现次数.

So yes, this is exactly as you assumed a simple iteration over the string sequence and counting the occurences of the substring.

对于长于单个字符的搜索字符串,由于处理重叠等原因,它会变得更加复杂,并且逻辑被深深地植入了 fastsearch 实现中.但这本质上是相同的:在字符串中进行线性搜索.

For search strings longer than a single character it gets a bit more complicated, due to handling overlaps etc., and the logic is buried deeper in the fastsearch implementation. But it’s essentially the same: a linear search through the string.

是的, str.count 是线性时间O(n).而且,如果您考虑一下,这很有道理:为了知道子字符串在字符串中出现的频率,您需要查看相同长度的每个可能的子字符串.因此,对于长度为1的子字符串,您必须查看字符串中的每个字符,这给您带来了线性复杂度.

So yes, str.count is in linear time, O(n). And if you think about it, it makes a lot of sense: In order to know how often a substring appears in a string, you need to look at every possible substring of the same length. So for a substring length of 1, you have to look at every character in the string, giving you a linear complexity.

顺便说一句.有关基础快速搜索算法的更多信息,请参见有关effbot.org的文章.

Btw. for more information about the underlying fastsearch algorithm, see this article on effbot.org.

对于只有单一Unicode字符串类型的Python 3,实现的链接为: fastsearch .

For Python 3, which only has a single Unicode string type, the links to the implementations are: unicode_count which uses stringlib_count which uses fastsearch.

这篇关于在字符串Python上进行计数操作的计算成本是多少?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆