Python 在字符串列表中找到最常见的模式 [英] Python finding most common pattern in list of strings

查看:62
本文介绍了Python 在字符串列表中找到最常见的模式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一大串 API 调用存储为字符串,这些调用已经去除了所有常见的语法('htttp://'、'.com'、'.' 等)

I have a large list of API calls stored as strings, which have been stripped of all common syntax('htttp://', '.com', '.', etc..)

我想返回一个长度 > 3 的最常见模式的字典,其中键是找到的模式,值是每个模式的出现次数.我试过这个:

I would like to return a dictionary of the most common patterns which have a length > 3, where the keys are the found patterns and values are the number of occurrences of each pattern. I've tried this:

calls = ['admobapioauthcert', 'admobapinewsession', 'admobendusercampaign']

>>> from itertools import takewhile, izip
>>> ''.join(c[0] for c in takewhile(lambda x: all(x[0] == y for y in x), izip(*calls)))

返回:

'admob'

我希望它返回:

{'obap': 2, 'dmob': 3, 'admo': 3, 'admobap': 2, 'bap': 2, 'dmobap': 2, 'admobapi': 2, 'moba': 2, 'bapi': 2, 'dmo': 3, 'obapi': 2, 'mobapi': 2, 'admob': 3, 'api': 2, 'dmobapi': 2, 'dmoba': 2, 'mobap': 2, 'mob': 3, 'adm': 3, 'admoba': 2, 'oba': 2}

-我当前的方法仅适用于识别前缀,但我需要它对所有字符进行操作,无论它在字符串中的位置如何,而且我想再次将每个模式的出现次数存储为 dict 值.(我尝试过其他方法来实现这一点,但它们非常难看).

-My current method only works at identifying prefixes, but i need it to operate on all characters, regardless of it's position in the string, and again I would like to store the number of occurrences of each pattern as dict values. (I've tried other methods to accomplish this, but they are quite ugly).

推荐答案

使用Collections.Counter,然后用点号分割,最后使用dict comprehension-

Use Collections.Counter, then split by dot afterall use dict comprehension-

>>>from collections import Counter
>>>calls = ['admob.api.oauthcert', 'admob.api.newsession', 'admob.endusercampaign']
>>>l = '.'.join(calls).split(".")
>>>d = Counter(l)
>>>{k:v for k,v in d.most_common(3) }
>>>{'admob': 3, 'api': 2}
>>>{k:v for k,v in d.most_common(4) }
>>>{'admob': 3, 'api': 2, 'newsession': 1, 'oauthcert': 1}

>>>import re
>>>from collections import Counter
>>>d =  re.findall(r'\w+',"['admob.api.oauthcert', 'admob.api.newsession', 'admob.endusercampaign']")
>>>{k:v for k,v in Counter(d).most_common(2)}
>>>[('mob', 3), ('admob', 3), ('api', 2)]

>>>from collections import Counter
>>>import re
>>>s= "['admobapioauthcert', 'admobapinewsession', 'admobendusercampaign']"
>>>w=[i for sb in re.findall(r'(?=(mob)|(api)|(admob))',s) for i in sb ]#Change (mob)|(api)|(admob) what you want
>>>{k:v for k,v in Counter(filter(bool, w)).most_common()}
>>>{'mob': 3, 'admob': 3, 'api': 2}

这篇关于Python 在字符串列表中找到最常见的模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆