分割python字符串 [英] Splitting a python string
问题描述
我在python中有一个字符串,我想以一种非常特殊的方式进行拆分.我想将其拆分为包含每个单独单词的列表,但一组单词以特定字符为边界的情况除外.例如,以下字符串将被拆分.
I have a string in python that I want to split in a very particular manner. I want to split it into a list containing each separate word, except for the case when a group of words are bordered by a particular character. For example, the following strings would be split as such.
'Jimmy threw his ball through the window.'
成为
['Jimmy', 'threw', 'his', 'ball', 'through', 'the', 'window.']
但是,我想要带有边框字符
However, with a border character I'd want
'Jimmy |threw his ball| through the window.'
成为
['Jimmy', 'threw his ball', 'through', 'the', 'window.']
作为附加组件,我需要-
,它可能出现在分组短语的外面,以便在拆分后出现在其中,
As an additional component I need -
which may appear outside the grouping phrase to appear inside it after splitting up i.e.,
'Jimmy |threw his| ball -|through the| window.'
将成为
['Jimmy', 'threw his', 'ball', '-through the', 'window.']
在没有很多复杂的for循环和if语句的情况下,我找不到一种简单的,pythonic的方式来做到这一点.有没有简单的方法来处理这样的事情?
I cannot find a simple, pythonic way to do this without a lot of complicated for loops and if statements. Is there a simple way to handle something like this?
推荐答案
这不是开箱即用的解决方案,但这是一个非常像Python的函数,应该可以处理您扔给它的几乎所有内容
This isn't something with an out-of-the-box solution, but here's a function that's pretty Pythonic that should handle pretty much anything you throw at it.
def extract_groups(s):
separator = re.compile("(-?\|[\w ]+\|)")
components = separator.split(s)
groups = []
for component in components:
component = component.strip()
if len(component) == 0:
continue
elif component[0] in ['-', '|']:
groups.append(component.replace('|', ''))
else:
groups.extend(component.split(' '))
return groups
使用您的示例:
>>> extract_groups('Jimmy threw his ball through the window.')
['Jimmy', 'threw', 'his', 'ball', 'through', 'the', 'window.']
>>> extract_groups('Jimmy |threw his ball| through the window.')
['Jimmy', 'threw his ball', 'through the', 'window.']
>>> extract_groups('Jimmy |threw his| ball -|through the| window.')
['Jimmy', 'threw his', 'ball', '-through the', 'window.']
这篇关于分割python字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!