在python中拆分大写字母组 [英] Splitting on group of capital letters in python

查看:86
本文介绍了在python中拆分大写字母组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用大写字母作为分隔符来标记多个字符串.我登陆了以下代码:

token = ([a for a in re.split(r'([A-Z][a-z]*)', "ABCowDog") if a])打印令牌

我得到了这个,正如预期的那样,作为回报:

<块引用>

['A', 'B', '牛', '狗']

现在,这只是一个让生活更轻松的示例字符串,但在我的情况下,我想浏览此列表并找到单个字符(检查 len() 就足够了)并将单个字母放在一起,前提是它们满足先前的定义.在上面的示例中,字符串 'AB'、'Cow' 和 'Dog' 是我真正想要形成的字符串(连续的大写字母是首字母缩略词的一部分).无论出于何种原因,一旦我拥有令牌,我就无法弄清楚如何遍历列表.对不起,如果这是一个简单的答案,但我对 python 相当陌生,并且厌倦了用头撞墙.

解决方案

re.split 并不总是易于使用,而且在许多情况下有时似乎受到限制.您可以使用 re.findall 尝试不同的方法:

<预><代码>>>>s = 'ABCowDog'>>>re.findall(r'[A-Z](?:[A-Z]*(?![a-z])|[a-z]*)', s)['AB'、'牛'、'狗']

I'm trying to tokenize a number of strings using a capital letter as a delimited. I have landed on the following code:

token = ([a for a in re.split(r'([A-Z][a-z]*)', "ABCowDog") if a])

print token

And I get this, as expected, in return:

['A', 'B', 'Cow', 'Dog']

Now, this is just an example string to make life easier, but in my case I want to go through this list and find individual characters (easy enough with checking len()) and putting the individual letters together, provided they meet a prior definition. In the example above the strings 'AB', 'Cow', and 'Dog' are the strings I actually want to form (consecutive capitals are part of an acronym). For whatever reason, once I have my token, I am unable to figure out how to walk the list. Sorry if this is a simple answer, but I'm fairly new to python and am sick of banging my head against the wall.

解决方案

re.split isn't always easy to use and seems sometimes limited in many situations. You can try a different approach with re.findall:

>>> s = 'ABCowDog'
>>> re.findall(r'[A-Z](?:[A-Z]*(?![a-z])|[a-z]*)', s)
['AB', 'Cow', 'Dog']

这篇关于在python中拆分大写字母组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆