Python:查找所有可能的带有字符序列的单词组合(分词) [英] Python: find all possible word combinations with a sequence of characters (word segmentation)

查看:207
本文介绍了Python:查找所有可能的带有字符序列的单词组合(分词)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在做一些如下的分词实验.

I'm doing some word segmentation experiments like the followings.

lst是一个字符序列,而output是所有可能的单词.

lst is a sequence of characters, and output is all the possible words.

lst = ['a', 'b', 'c', 'd']

def foo(lst):
    ...
    return output

output = [['a', 'b', 'c', 'd'],
          ['ab', 'c', 'd'],
          ['a', 'bc', 'd'],
          ['a', 'b', 'cd'],
          ['ab', 'cd'],
          ['abc', 'd'],
          ['a', 'bcd'],
          ['abcd']]

我已经在itertools库中检查了combinationspermutations
并且还尝试了组合器.
但是,似乎我看错了方面,因为这不是纯粹的排列和组合...

I've checked combinations and permutations in itertools library,
and also tried combinatorics.
However, it seems that I'm looking at the wrong side because this is not pure permutation and combinations...

似乎我可以通过使用很多循环来实现,但是效率可能很低.

It seems that I can achieve this by using lots of loops, but the efficiency might be low.

编辑

单词顺序很重要,因此['ba', 'dc']['cd', 'ab']之类的组合无效.

The word order is important so combinations like ['ba', 'dc'] or ['cd', 'ab'] are not valid.

顺序应始终为从左到右.

编辑

@Stuart的解决方案在Python 2.7.6中不起作用

@Stuart's solution doesn't work in Python 2.7.6

编辑

@Stuart的解决方案在Python 2.7.6中有效,请参见下面的注释.

@Stuart's solution does work in Python 2.7.6, see the comments below.

推荐答案

itertools.product确实应该可以为您提供帮助.

itertools.product should indeed be able to help you.

想法是这样的: 考虑由平板分隔的A1,A2,...,AN.将有N-1个平板. 如果有平板,则存在分段.如果没有平板,则存在联接. 因此,对于给定的长度为N的序列,您应该有2 ^(N-1)个这样的组合.

The idea is this:- Consider A1, A2, ..., AN separated by slabs. There will be N-1 slabs. If there is a slab there is a segmentation. If there is no slab, there is a join. Thus, for a given sequence of length N, you should have 2^(N-1) such combinations.

就像下面的

import itertools
lst = ['a', 'b', 'c', 'd']
combinatorics = itertools.product([True, False], repeat=len(lst) - 1)

solution = []
for combination in combinatorics:
    i = 0
    one_such_combination = [lst[i]]
    for slab in combination:
        i += 1
        if not slab: # there is a join
            one_such_combination[-1] += lst[i]
        else:
            one_such_combination += [lst[i]]
    solution.append(one_such_combination)

print solution

这篇关于Python:查找所有可能的带有字符序列的单词组合(分词)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆