在字符串Python上进行计数操作的计算成本是多少? [英] What's the computational cost of count operation on strings Python?
问题描述
例如:
'hello'.count('e')
这是O(n)吗?我猜它的工作方式是它扫描'hello'
并在每次看到字母'e'
时增加一个计数器.我怎么能不知道就知道这一点?我尝试在此处中阅读源代码,但在发现时陷入了困境这个:
Is this O(n)? I'm guessing the way it works is it scans 'hello'
and increments a counter each time the letter 'e'
is seen. How can I know this without guessing? I tried reading the source code here, but got stuck upon finding this:
def count(s, *args):
"""count(s, sub[, start[,end]]) -> int
Return the number of occurrences of substring sub in string
s[start:end]. Optional arguments start and end are
interpreted as in slice notation.
"""
return s.count(*args)
在哪里可以了解 s.count(* args)
中执行的内容?
Where can I read about what's executed in s.count(*args)
?
我了解 * args
在Python函数上下文中的作用.
edit: I understand what *args
does in the context of Python functions.
推荐答案
str.count
在 stringlib_count
或 fastsearch
来搜索字符串中子字符串的出现并计数.
str.count
is implemented in native code, in the stringobject.c
file, which delegates to either stringlib_count
, or PyUnicode_Count
which itself delegates to stringlib_count
again. stringlib_count
ultimately uses fastsearch
to search for occurrences of the substring in the string and counting those.
对于一个字符的字符串(例如您的'e'
),它会短路到以下代码路径:
For one-character strings (e.g. your 'e'
), it is short-circuited to the following code path:
for (i = 0; i < n; i++)
if (s[i] == p[0]) {
count++;
if (count == maxcount)
return maxcount;
}
return count;
是的,这正好是您假设对字符串序列进行简单迭代并计算子字符串的出现次数.
So yes, this is exactly as you assumed a simple iteration over the string sequence and counting the occurences of the substring.
对于长于单个字符的搜索字符串,由于处理重叠等原因,它会变得更加复杂,并且逻辑被深深地植入了 fastsearch
实现中.但这本质上是相同的:在字符串中进行线性搜索.
For search strings longer than a single character it gets a bit more complicated, due to handling overlaps etc., and the logic is buried deeper in the fastsearch
implementation. But it’s essentially the same: a linear search through the string.
是的, str.count
是线性时间O(n).而且,如果您考虑一下,这很有道理:为了知道子字符串在字符串中出现的频率,您需要查看相同长度的每个可能的子字符串.因此,对于长度为1的子字符串,您必须查看字符串中的每个字符,这给您带来了线性复杂度.
So yes, str.count
is in linear time, O(n). And if you think about it, it makes a lot of sense: In order to know how often a substring appears in a string, you need to look at every possible substring of the same length. So for a substring length of 1, you have to look at every character in the string, giving you a linear complexity.
顺便说一句.有关基础快速搜索算法的更多信息,请参见有关effbot.org的文章.
Btw. for more information about the underlying fastsearch algorithm, see this article on effbot.org.
对于只有单一Unicode字符串类型的Python 3,实现的链接为: stringlib_count
,它使用 fastsearch
.
For Python 3, which only has a single Unicode string type, the links to the implementations are: unicode_count
which uses stringlib_count
which uses fastsearch
.
这篇关于在字符串Python上进行计数操作的计算成本是多少?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!