测试字符串是否包含重复字符 [英] Testing whether a string has repeated characters
问题描述
我试图以最轻巧的方式找出最轻巧的方法来确定字符串是否具有重复的字符。我曾尝试搜索类似的问题,但找不到任何问题。
I am trying to figure out the lightest method for determining whether a string has any repeated characters, in the lightest way possible. I have tried searching for similar questions, but cant find any. It also needs to be the shorted way possible, as i will be checking quite a few strings (I can handle putting this into a loop, etc.)
例如,它也应该是一种可能的缩短方法,因为我将检查很多字符串(我可以将其放入循环等)。
For example:
a = "12348546478"
#code to check multiple characters
print(result)
结果:重复8次,重复4次
Results: 8 was repeated, 4 was repeated
该代码将检查重复的字符并打印出重复的字符。我不需要知道重复了多少次,只需重复一次或不重复。
The code will check what character was repeated and print out what was repeated. I don't need to know how many times it was repeated, just whether it was or was not repeated.
推荐答案
您可以使用 collections.Counter
:
You can use collections.Counter
:
>>> from collections import Counter
>>> [i for i,j in Counter(a).items() if j>1]
['4', '8']
或者您可以使用自定义函数:
Or you can use a custom function :
>>> def finder(s):
... seen,yields=set(),set()
... for i in s:
... if i in seen:
... if i not in yields:
... yield i
... yields.add(i)
... else :
... yields.add(i)
... else:
... seen.add(i)
...
>>> list(finder(a))
['4', '8']
或在集合理解中使用 str.count
方法:
Or use str.count
method in a set comprehension :
>>> set(i for i in a if a.count(i)>1)
set(['8', '4'])
所有方法的基准,表明最后两种方法(自定义函数和集合理解)比 Counter
):
A benchmark on all approaches, which shows that the last 2 way (custom function and set comprehensions are much faster than Counter
):
from timeit import timeit
s1="""
a = "12348546478"
[i for i,j in Counter(a).items() if j>1]
"""
s2="""
def finder(s):
seen,yields=set(),set()
for i in s:
if i in seen:
if i not in yields:
yield i
yields.add(i)
else :
yields.add(i)
else:
seen.add(i)
a = "12348546478"
list(finder(a))
"""
s3="""
a = "12348546478"
set(i for i in a if a.count(i)>1)
"""
print '1st: ' ,timeit(stmt=s1, number=100000,setup="from collections import Counter")
print '2nd : ',timeit(stmt=s2, number=100000)
print '3rd : ',timeit(stmt=s2, number=100000)
结果:
1st: 0.726881027222
2nd : 0.265578985214
3rd : 0.26243185997
我也尝试使用长字符串( a = 12348546478 * 10000
),但仍然得到相同的结果:
I also tried this for long string (a = "12348546478"*10000
) and still got the same result:
1st: 25.5780302721341
2nd : 11.8482989001177
3rd : 11.926538944245
无论如何,我的建议是使用set理解,这更像pythonic:
Any way my suggestion is using the set comprehension which is more pythonic :
set(i for i in a if a.count(i)>1)
这篇关于测试字符串是否包含重复字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!