在 " 上拆分字符串.「、」!"或“?"保留标点符号 [英] Split string on ". ","! " or "? " keeping the punctuation mark

查看:63
本文介绍了在 " 上拆分字符串.「、」!"或“?"保留标点符号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

<块引用>

可能的重复:
Python split() 不删除分隔符

我希望按如下方式拆分字符串:

text = " T?e qu!ck 'brown 1 fox! jumps-.ver. the 'lazy' doG?!"结果 ->(T?e qu!ck 'brown 1 fox!"、jumps-.ver."、the 'lazy' doG?"、!")

所以基本上我想在 ". ", "! ""? " 处拆分,但我想要拆分处的空格要删除的点,但不是点、逗号或问号.

我怎样才能有效地做到这一点?

str split 函数只接受分隔符.我想知道在构建所需结果时拆分所有空格然后找到以点,逗号或问号结尾的最佳解决方案.

解决方案

您可以使用正则表达式拆分来实现:

<预><代码>>>>进口重新>>>text = "T?e qu!ck 'brown 1 fox! jumps-.ver. the 'lazy' doG?!">>>re.split('(?<=[.!?]) +',text)[" T?e qu!ck ' brown 1 fox!", 'jumps-.ver.', "the 'lazy' doG?", '!']

正则表达式 '(?<=[.!?]) +' 表示匹配一个或多个空格的序列 (' +') 仅当前面有一个 ., !或者 ?字符 ('(?<=[.!?])').

Possible Duplicate:
Python split() without removing the delimiter

I wish to split a string as follows:

text = " T?e  qu!ck ' brown 1 fox!     jumps-.ver. the 'lazy' doG?  !"
result -> (" T?e  qu!ck ' brown 1 fox!", "jumps-.ver.", "the 'lazy' doG?", "!")

So basically I want to split at ". ", "! " or "? " but I want the spaces at the split points to be removed but not the dot, comma or question-mark.

How can I do this in an efficient way?

The str split function takes only on separator. I wonder is the best solution to split on all spaces and then find those that end with dot, comma or question-mark when constructing the required result.

解决方案

You can achieve this using a regular expression split:

>>> import re
>>> text = " T?e  qu!ck ' brown 1 fox! jumps-.ver. the 'lazy' doG?  !"
>>> re.split('(?<=[.!?]) +',text)
[" T?e  qu!ck ' brown 1 fox!", 'jumps-.ver.', "the 'lazy' doG?", '!']

The regular expression '(?<=[.!?]) +' means match a sequence of one or more spaces (' +') only if preceded by a ., ! or ? character ('(?<=[.!?])').

这篇关于在 &quot; 上拆分字符串.「、」!&quot;或“?&quot;保留标点符号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆