Python/Regex - 匹配 .#,#.在字符串中 [英] Python/Regex - Match .#,#. in String

查看:51
本文介绍了Python/Regex - 匹配 .#,#.在字符串中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我可以使用什么正则表达式来匹配.#,#."一个字符串内.它可能存在也可能不存在于字符串中.一些具有预期输出的示例可能是:

Test1.0,0.csv ->('Test1', '0,0', 'csv') (基本示例)Test2.wma ->('Test2', 'wma') (不匹配)Test3.1100,456.jpg ->('Test3', '1100,456', 'jpg') (基本的大数)T.E.S.T.4.5,6.png ->('T.E.S.T.4', '5,6', 'png')(不去除所有句点)Test5,7,8.sss ->('Test5,7,8', 'sss') (无匹配)Test6.2,3,4.png ->('Test6.2,3,4', 'png') (No Match, to many逗号)Test7.5,6.7,8.test ->('Test7', '5,6', '7,8', 'test') (双重匹配?)

最后一个不太重要,我只希望 .#,#.会出现一次.我正在处理的大多数文件,我希望属于第一个到第四个示例,所以我对这些最感兴趣.

感谢您的帮助!

解决方案

要允许多个连续匹配,请使用lookahead/lookbehind:

r'(?<=\.)\d+,\d+(?=\.)'

示例:

<预><代码>>>>re.findall(r'(?<=\.)\d+,\d+(?=\.)', 'Test7.5,6.7,8.test')['5,6', '7,8']

我们还可以根据需要使用前瞻来执行拆分:

导入重新def split_it(s):件 = re.split(r'\.(?=\d+,\d+\.)', s)pieces[-1:] =pieces[-1].rsplit('.', 1) # 拆分扩展退件

测试:

<预><代码>>>>打印 split_it('Test1.0,0.csv')['Test1', '0,0', 'csv']>>>打印 split_it('Test2.wma')['测试2','wma']>>>打印 split_it('Test3.1100,456.jpg')['Test3', '1100,456', 'jpg']>>>打印 split_it('T.E.S.T.4.5,6.png')['T.E.S.T.4', '5,6', 'png']>>>打印 split_it('Test5,7,8.sss')['Test5,7,8','sss']>>>打印 split_it('Test6.2,3,4.png')['Test6.2,3,4','png']>>>打印 split_it('Test7.5,6.7,8.test')['Test7', '5,6', '7,8', '测试']

What regex can I use to match ".#,#." within a string. It may or may not exist in the string. Some examples with expected outputs might be:

Test1.0,0.csv      -> ('Test1', '0,0', 'csv')         (Basic Example)
Test2.wma          -> ('Test2', 'wma')                (No Match)
Test3.1100,456.jpg -> ('Test3', '1100,456', 'jpg')    (Basic with Large Number)
T.E.S.T.4.5,6.png  -> ('T.E.S.T.4', '5,6', 'png')     (Doesn't strip all periods)
Test5,7,8.sss      -> ('Test5,7,8', 'sss')            (No Match)
Test6.2,3,4.png    -> ('Test6.2,3,4', 'png')          (No Match, to many commas)
Test7.5,6.7,8.test -> ('Test7', '5,6', '7,8', 'test') (Double Match?)

The last one isn't too important and I would only expect that .#,#. would appear once. Most files I'm processing, I would expect to fall into the first through fourth examples, so I'm most interested in those.

Thanks for the help!

解决方案

To allow for multiple consecutive matches, use lookahead/lookbehind:

r'(?<=\.)\d+,\d+(?=\.)'

Example:

>>> re.findall(r'(?<=\.)\d+,\d+(?=\.)', 'Test7.5,6.7,8.test')
['5,6', '7,8']

We can also use lookahead to perform the split as you want it:

import re
def split_it(s):
    pieces = re.split(r'\.(?=\d+,\d+\.)', s)
    pieces[-1:] = pieces[-1].rsplit('.', 1) # split off extension
    return pieces

Testing:

>>> print split_it('Test1.0,0.csv')
['Test1', '0,0', 'csv']
>>> print split_it('Test2.wma')
['Test2', 'wma']
>>> print split_it('Test3.1100,456.jpg')
['Test3', '1100,456', 'jpg']
>>> print split_it('T.E.S.T.4.5,6.png')
['T.E.S.T.4', '5,6', 'png']
>>> print split_it('Test5,7,8.sss')
['Test5,7,8', 'sss']
>>> print split_it('Test6.2,3,4.png')
['Test6.2,3,4', 'png']
>>> print split_it('Test7.5,6.7,8.test')
['Test7', '5,6', '7,8', 'test']

这篇关于Python/Regex - 匹配 .#,#.在字符串中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆