检查一个单词是否是另一个具有相同数量字母的单词的子集 [英] Checking if a word is a subset of another word with same amount of letters

查看:46
本文介绍了检查一个单词是否是另一个具有相同数量字母的单词的子集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在制作一个文字游戏程序,我从一个文本文件中获得了一个约 80,000 个单词的列表,然后将这些单词用作可供选择的单词词典.用户请求一定长度的单词,然后将其加扰.然后他们猜测长度相同或更少的单词,并且使用相同数量或更少的相同字母.我有这个列表理解,以便从词典中获取所有单词,这些单词是加扰单词的子集并且也在词典中.但是,它允许出现比原始单词中出现的更多的字母.例如:如果打乱的单词是 'minute',那么 'in' 应该是正确的答案,但 'inn' 应该不是.我现在写的方式允许这样做.这是列表理解:

correct_answers = [word_list 中的逐字逐句if set(word).issubset(random_length_word)和 word_list 中的单词和 len(word) <= len(random_length_word)]

所以我正在寻找类似 issubset 的东西,但它只允许相同数量或更少的字母.希望这是有道理的.提前致谢.

解决方案

我编写了一个函数来执行此操作以播放 倒计时字母游戏.我将所需的输入称为子集字谜",但可能有更好的技术术语.

本质上,您要寻找的是 multiset(来自 word),它是另一个多重集的子集(来自 random_length_word).您可以使用 collections.Counter,但实际上我发现以不同的方式来做要快得多:用 random_length_word 制作一个列表,然后删除 word 的每个字符.由于创建新的 Counter 对象的开销,它可能会更快.

def is_subset_anagram(str1, str2):"检查 str1 是否是 str2 的子集变位词.如果 str2 包含的每个字符至少与 str1 一样多,则返回 true.>>>is_subset_anagram('bottle', 'belott') # 就够了真的>>>is_subset_anagram('bottle', 'belot') # 少错误的>>>is_subset_anagram('bottle', 'bbeloott') # 更多真的"列表2 = 列表(str2)尝试:对于 str1 中的字符:list2.remove(char)除了值错误:返回错误返回真

<预><代码>>>>[w for w in ['in', 'inn', 'minute'] if is_subset_anagram(w, 'minute')]['在','分钟']


就其价值而言,这是 Counter 实现:

from collections import Counterdef is_subset_anagram(str1, str2):delta = 计数器 (str1) - 计数器 (str2)返回不是增量

这是有效的,因为 Counter.__sub__() 产生一个多重集,即小于 1 的计数被删除.

I am making a word game program where I get a list of ~80,000 words from a text file, then use those words as a lexicon of words to choose from. The user requests a word of a certain length which is then given to them scrambled. They then guess words that are of the same length or less and that use the same letters in the same amount or less. I have this list comprehension in order to get all the words from the lexicon that are subsets of the scrambled word and are also in the lexicon. However it allows more occurrences of letters than appear in the original word. For example: If the scrambled word was 'minute', then 'in' should be a correct answer but 'inn' should not. The way I have it written now allows that though. Here is the list comprehension:

correct_answers = [
    word for word in word_list
    if set(word).issubset(random_length_word)
    and word in word_list
    and len(word) <= len(random_length_word)]

So I'm looking for something like issubset but that only allows the same number of letters or less. Hopefully that makes sense. Thanks in advance.

解决方案

I wrote a function that does this for playing the Countdown letters game. I called the desired input a "subset-anagram", but there's probably a better technical term for it.

Essentially, what you're looking for is a multiset (from word) that is a subset of another multiset (from random_length_word). You can do this with collections.Counter, but I actually found it much faster to do it a different way: make a list out of random_length_word, then remove each character of word. It's probably faster due to the overhead of creating new Counter objects.

def is_subset_anagram(str1, str2):
    """
    Check if str1 is a subset-anagram of str2.

    Return true if str2 contains at least as many of each char as str1.

    >>> is_subset_anagram('bottle', 'belott')  # Just enough
    True
    >>> is_subset_anagram('bottle', 'belot')  # less
    False
    >>> is_subset_anagram('bottle', 'bbeelloott')  # More
    True
    """
    list2 = list(str2)
    try:
        for char in str1:
            list2.remove(char)
    except ValueError:
        return False
    return True

>>> [w for w in ['in', 'inn', 'minute'] if is_subset_anagram(w, 'minute')]
['in', 'minute']


For what it's worth, here's the Counter implementation:

from collections import Counter

def is_subset_anagram(str1, str2):
    delta = Counter(str1) - Counter(str2)
    return not delta

This works because Counter.__sub__() produces a multiset, that is, counts less than 1 are removed.

这篇关于检查一个单词是否是另一个具有相同数量字母的单词的子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆