将字符串拆分为所有可能的有序短语 [英] Split a string into all possible ordered phrases

查看:77
本文介绍了将字符串拆分为所有可能的有序短语的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试探索Python内置函数的功能.我目前正在尝试处理采用字符串的内容,例如:

I am trying to explore the functionality of Python's built-in functions. I'm currently trying to work up something that takes a string such as:

'the fast dog'

,然后将字符串分解为所有可能的有序短语,如列表所示.上面的示例将输出以下内容:

and break the string down into all possible ordered phrases, as lists. The example above would output as the following:

[['the', 'fast dog'], ['the fast', 'dog'], ['the', 'fast', 'dog']]

关键是在生成可能的短语时,必须保留字符串中单词的原始顺序.

The key thing is that the original ordering of the words in the string needs to be preserved when generating the possible phrases.

我已经能够使用可以执行此操作的功能,但是它既麻烦又丑陋.但是,我想知道Python中的某些内置功能是否有用.我当时在想,可以在不同的空格处分割字符串,然后将其递归地应用于每个分割.有人可能有什么建议吗?

I've been able to get a function to work that can do this, but it is fairly cumbersome and ugly. However, I was wondering if some of the built-in functionality in Python might be of use. I was thinking that it might be possible to split the string at various white spaces, and then apply that recursively to each split. Might anyone have some suggestions?

推荐答案

使用 itertools.combinations :

import itertools

def break_down(text):
    words = text.split()
    ns = range(1, len(words)) # n = 1..(n-1)
    for n in ns: # split into 2, 3, 4, ..., n parts.
        for idxs in itertools.combinations(ns, n):
            yield [' '.join(words[i:j]) for i, j in zip((0,) + idxs, idxs + (None,))]

示例:

>>> for x in break_down('the fast dog'):
...     print(x)
...
['the', 'fast dog']
['the fast', 'dog']
['the', 'fast', 'dog']

>>> for x in break_down('the really fast dog'):
...     print(x)
...
['the', 'really fast dog']
['the really', 'fast dog']
['the really fast', 'dog']
['the', 'really', 'fast dog']
['the', 'really fast', 'dog']
['the really', 'fast', 'dog']
['the', 'really', 'fast', 'dog']

这篇关于将字符串拆分为所有可能的有序短语的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆