分割python字符串 [英] Splitting a python string

查看:79
本文介绍了分割python字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在python中有一个字符串,我想以一种非常特殊的方式进行拆分.我想将其拆分为包含每个单独单词的列表,但一组单词以特定字符为边界的情况除外.例如,以下字符串将被拆分.

I have a string in python that I want to split in a very particular manner. I want to split it into a list containing each separate word, except for the case when a group of words are bordered by a particular character. For example, the following strings would be split as such.

'Jimmy threw his ball through the window.'

成为

['Jimmy', 'threw', 'his', 'ball', 'through', 'the', 'window.']

但是,我想要带有边框字符

However, with a border character I'd want

'Jimmy |threw his ball| through the window.'

成为

['Jimmy', 'threw his ball', 'through', 'the', 'window.']

作为附加组件,我需要-,它可能出现在分组短语的外面,以便在拆分后出现在其中,

As an additional component I need - which may appear outside the grouping phrase to appear inside it after splitting up i.e.,

'Jimmy |threw his| ball -|through the| window.'

将成为

['Jimmy', 'threw his', 'ball', '-through the', 'window.']

在没有很多复杂的for循环和if语句的情况下,我找不到一种简单的,pythonic的方式来做到这一点.有没有简单的方法来处理这样的事情?

I cannot find a simple, pythonic way to do this without a lot of complicated for loops and if statements. Is there a simple way to handle something like this?

推荐答案

这不是开箱即用的解决方案,但这是一个非常像Python的函数,应该可以处理您扔给它的几乎所有内容

This isn't something with an out-of-the-box solution, but here's a function that's pretty Pythonic that should handle pretty much anything you throw at it.

def extract_groups(s):
    separator = re.compile("(-?\|[\w ]+\|)")
    components = separator.split(s)
    groups = []
    for component in components:
        component = component.strip()
        if len(component) == 0:
            continue
        elif component[0] in ['-', '|']:
            groups.append(component.replace('|', ''))
        else:
            groups.extend(component.split(' '))

    return groups

使用您的示例:

>>> extract_groups('Jimmy threw his ball through the window.')
['Jimmy', 'threw', 'his', 'ball', 'through', 'the', 'window.']
>>> extract_groups('Jimmy |threw his ball| through the window.')
['Jimmy', 'threw his ball', 'through the', 'window.']
>>> extract_groups('Jimmy |threw his| ball -|through the| window.')
['Jimmy', 'threw his', 'ball', '-through the', 'window.']

这篇关于分割python字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆