如何在python中进行CamelCase拆分 [英] How to do CamelCase split in python

查看:87
本文介绍了如何在python中进行CamelCase拆分的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图实现的目标是这样的:

<预><代码>>>>camel_case_split("CamelCaseXYZ")['骆驼','案例','XYZ']>>>camel_case_split("XYZCamelCase")['XYZ','骆驼','案例']

所以我搜索并找到了这个完美的正则表达式:

(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])

作为我尝试的下一个合乎逻辑的步骤:

<预><代码>>>>re.split("(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])", "CamelCaseXYZ")['CamelCaseXYZ']

为什么这不起作用,我如何从 python 中的链接问题获得结果?

解决方案摘要

我用几个测试用例测试了所有提供的解决方案:

字符串:''AplusKminus: ['']casimir_et_hippolyte: []two_hundred_success: []kalefranz:字符串索引超出范围 # 修改:[] 或 ['']细绳:                 ' 'AplusKminus: [' ']casimir_et_hippolyte: []two_hundred_success: [' ']卡莱弗兰兹:['']字符串:'低'所有算法:['较低']字符串:'上'所有算法:['UPPER']字符串:'初始'所有算法:['初始']字符串:'dromedaryCase'AplusKminus: ['单峰骆驼', '案例']casimir_et_hippolyte:['单峰驼','案例']two_hundred_success:['单峰驼','案例']kalefranz: ['Dromedary', 'Case'] # 修改:['dromedary', 'Case']字符串:'CamelCase'所有算法:['骆驼','案例']字符串:'ABCWordDEF'AplusKminus: ['ABC', 'Word', 'DEF']casimir_et_hippolyte: ['ABC', 'Word', 'DEF']two_hundred_success: ['ABC', 'Word', 'DEF']kalefranz: ['ABCWord', 'DEF']

总而言之,您可以说@kalefranz 的解决方案与问题不匹配(请参阅最后一个案例),@casimir et hippolyte 的解决方案占用了一个空间,从而违反了拆分不应改变个人的想法部分.其余两个选项之间的唯一区别是,我的解决方案在空字符串输入上返回一个带有空字符串的列表,@200_success 的解决方案返回一个空列表.我不知道 python 社区如何看待这个问题,所以我说:我对任何一个都很好.而且由于 200_success 的解决方案更简单,所以我接受了它作为正确答案.

解决方案

正如@AplusKminus 所解释的,re.split() 永远不会在空模式匹配时拆分.因此,您应该尝试找到您感兴趣的组件,而不是拆分.

这是一个使用 re.finditer() 模拟拆分的解决方案:

def camel_case_split(identifier):匹配 = finditer('.+?(?:(?<=[az])(?=[AZ])|(?<=[AZ])(?=[AZ][az])|$)', 标识符)返回 [m.group(0) for m 在匹配中]

What I was trying to achieve, was something like this:

>>> camel_case_split("CamelCaseXYZ")
['Camel', 'Case', 'XYZ']
>>> camel_case_split("XYZCamelCase")
['XYZ', 'Camel', 'Case']

So I searched and found this perfect regular expression:

(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])

As the next logical step I tried:

>>> re.split("(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])", "CamelCaseXYZ")
['CamelCaseXYZ']

Why does this not work, and how do I achieve the result from the linked question in python?

Edit: Solution summary

I tested all provided solutions with a few test cases:

string:                 ''
AplusKminus:            ['']
casimir_et_hippolyte:   []
two_hundred_success:    []
kalefranz:              string index out of range # with modification: either [] or ['']

string:                 ' '
AplusKminus:            [' ']
casimir_et_hippolyte:   []
two_hundred_success:    [' ']
kalefranz:              [' ']

string:                 'lower'
all algorithms:         ['lower']

string:                 'UPPER'
all algorithms:         ['UPPER']

string:                 'Initial'
all algorithms:         ['Initial']

string:                 'dromedaryCase'
AplusKminus:            ['dromedary', 'Case']
casimir_et_hippolyte:   ['dromedary', 'Case']
two_hundred_success:    ['dromedary', 'Case']
kalefranz:              ['Dromedary', 'Case'] # with modification: ['dromedary', 'Case']

string:                 'CamelCase'
all algorithms:         ['Camel', 'Case']

string:                 'ABCWordDEF'
AplusKminus:            ['ABC', 'Word', 'DEF']
casimir_et_hippolyte:   ['ABC', 'Word', 'DEF']
two_hundred_success:    ['ABC', 'Word', 'DEF']
kalefranz:              ['ABCWord', 'DEF']

In summary you could say the solution by @kalefranz does not match the question (see the last case) and the solution by @casimir et hippolyte eats a single space, and thereby violates the idea that a split should not change the individual parts. The only difference among the remaining two alternatives is that my solution returns a list with the empty string on an empty string input and the solution by @200_success returns an empty list. I don't know how the python community stands on that issue, so I say: I am fine with either one. And since 200_success's solution is simpler, I accepted it as the correct answer.

解决方案

As @AplusKminus has explained, re.split() never splits on an empty pattern match. Therefore, instead of splitting, you should try finding the components you are interested in.

Here is a solution using re.finditer() that emulates splitting:

def camel_case_split(identifier):
    matches = finditer('.+?(?:(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])|$)', identifier)
    return [m.group(0) for m in matches]

这篇关于如何在python中进行CamelCase拆分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆