查找字符串的最公共前缀-更好的方法？ [英] Find most* common prefix of strings - a better way?*

查看：93 发布时间：2020/6/3 21:05:36 python algorithm python-2.7 prefix robust

本文介绍了查找字符串的*最*公共前缀-更好的方法？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个键列表 ['foo_a'，'foo_b'，'foo_c'，'fnord']

此处所有类似的解决方案都假定您的文本中没有 fnord 。

All of the similar solutions here assume that you have no fnord's in your text.

我有执行此工作的这段代码：

I have this code that does the job:

def detect_prefix(keys):
    PCT = 0.70 # cutof 
    pre = ''
    l = len(keys)
    for i in range(0, len(max(keys, key=len))):
        keys = filter(lambda k: k.startswith(pre), keys)
        cnt = dict()
        for k in map(lambda k: k[i], keys):
            cnt.setdefault(k,0)
            cnt[k] +=1
        if cnt[max(cnt)] / float(l) >= PCT:
            pre += max(cnt)
        else:
            break
    return pre

我有一个强怀疑可以更优雅地完成此操作，但是我的python-fu目前不够强大。

I have a strong suspicion that this could be done much more elegantly, but my python-fu is not strong enough right now.

我很想听听一些建议。

修改。
其他背景和说明。

这些是其他开发人员在应用程序中放入的用于翻译的键。
他们应该有一个通用的前缀，但是人们忘记了，他们从其他代码中剪切并粘贴了代码。作为前缀分隔符的 _只是一个约定。最好不要假设甚至使用了分隔符。 70％是完全任意的阈值。

是的，这是python 2.7，引号内的空间只是一个视觉人工制品。

Edit. Additional background and clarification.
These are keys that other developers have put in an application to use for translation. They should have a common prefix, but people forget, and they cut and paste from other code. "_" as a prefix separator is just a convention. It would be best not to assume a separator is even used. 70% is a totally arbitrary threshold. "most prevalent" or "predominant" would have worked too.
And yes, this is python 2.7, and the space inside the quotes is just a visual artefact.

推荐答案

如果您知道通用前缀所需的阈值频率：

If you know the desired threshold frequency for the common prefix:

#!/usr/bin/env python
from collections import Counter
from itertools import izip_longest

strings = ['foo_a','foo_b','foo_c','fnord']
threshold = .7 * len(strings)
prefix = []
for chars in izip_longest(*strings, fillvalue=''):
    char, count = Counter(chars).most_common(1)[0]
    if count < threshold:
        break
    prefix.append(char)
print(''.join(prefix))
# -> foo_

或者您可以收集所有通用前缀及其频率并稍后决定：

Or you could collect all common prefixes and their frequencies and decide later:

#!/usr/bin/env python
from collections import Counter
from itertools import izip_longest

strings = ['foo_a', 'foo_b','foo_c','fnord']
assert len(strings) > 1
threshold = len(strings)
prefix = []
prefixes = []
for chars in izip_longest(*strings, fillvalue=''):
    char, count = Counter(chars).most_common(1)[0]
    if count == 1:
        break
    elif count < threshold:
        if prefix:
            prefixes.append((''.join(prefix), threshold))
        threshold = count
    prefix.append(char)
if prefix:
    prefixes.append((''.join(prefix), threshold))
print(prefixes)
# -> [('f', 4), ('foo_', 3)]

两个代码示例均假定主要前缀存在，即每个位置上最常见的字符都属于最常见前缀。

Both code examples assume that the predominant prefix exists i.e., the most common character at each position belongs to the most common prefix.

这篇关于查找字符串的*最*公共前缀-更好的方法？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

查找字符串的最公共前缀-更好的方法？ [英] Find most* common prefix of strings - a better way?*

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

查找字符串的*最*公共前缀-更好的方法？ [英] Find *most* common prefix of strings - a better way?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

查找字符串的最公共前缀-更好的方法？ [英] Find most* common prefix of strings - a better way?*

登录关闭