Python:按分隔符列表拆分字符串 [英] Python: Split string by list of separators

查看:287
本文介绍了Python:按分隔符列表拆分字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 Python 中,我想使用分隔符列表拆分字符串.分隔符可以是逗号或分号.除非空格位于非空格、非分隔符的中间,否则应删除空格,在这种情况下应保留.

测试用例 1:ABC,DEF123,GHI_JKL,MN OP
测试用例 2:ABC;DEF123;GHI_JKL;MN OP
测试用例 3: ABC ;DEF123,GHI_JKL ;MN OP

听起来像是正则表达式的例子,这很好,但如果用另一种方式更容易或更简洁,那就更好了.

谢谢!

解决方案

这应该比 regex 快得多,您可以根据需要传递分隔符列表:

def split(txt, sep):default_sep = sep[0]# 我们跳过 seps[0] 因为这是默认的分隔符对于 sep 中的 sep[1:]:txt = txt.replace(sep, default_sep)返回 [i.strip() for i in txt.split(default_sep)]

使用方法:

<预><代码>>>>split('ABC ; DEF123,GHI_JKL ; MN OP', (',', ';'))['ABC', 'DEF123', 'GHI_JKL', 'MN OP']

性能测试:

导入时间进口重新测试 = 'ABC ;DEF123,GHI_JKL ;MN OP'SEPS = (',', ';')rsplit = re.compile("|".join(SEPS)).split打印(timeit.timeit(lambda: [s.strip() for s in rsplit(TEST)]))# 1.6242462980007986打印(timeit.timeit(lambda: split(TEST, SEPS)))# 1.3588597209964064

还有一个更长的输入字符串:

TEST = 100 * 'ABC ;DEF123,GHI_JKL ;MN OP , '打印(timeit.timeit(lambda: [s.strip() for s in rsplit(TEST)]))# 130.67168392999884打印(timeit.timeit(lambda: split(TEST, SEPS)))# 50.31940778599528

In Python, I'd like to split a string using a list of separators. The separators could be either commas or semicolons. Whitespace should be removed unless it is in the middle of non-whitespace, non-separator characters, in which case it should be preserved.

Test case 1: ABC,DEF123,GHI_JKL,MN OP
Test case 2: ABC;DEF123;GHI_JKL;MN OP
Test case 3: ABC ; DEF123,GHI_JKL ; MN OP

Sounds like a case for regular expressions, which is fine, but if it's easier or cleaner to do it another way that would be even better.

Thanks!

解决方案

This should be much faster than regex and you can pass a list of separators as you wanted:

def split(txt, seps):
    default_sep = seps[0]

    # we skip seps[0] because that's the default separator
    for sep in seps[1:]:
        txt = txt.replace(sep, default_sep)
    return [i.strip() for i in txt.split(default_sep)]

How to use it:

>>> split('ABC ; DEF123,GHI_JKL ; MN OP', (',', ';'))
['ABC', 'DEF123', 'GHI_JKL', 'MN OP']

Performance test:

import timeit
import re


TEST = 'ABC ; DEF123,GHI_JKL ; MN OP'
SEPS = (',', ';')


rsplit = re.compile("|".join(SEPS)).split
print(timeit.timeit(lambda: [s.strip() for s in rsplit(TEST)]))
# 1.6242462980007986

print(timeit.timeit(lambda: split(TEST, SEPS)))
# 1.3588597209964064

And with a much longer input string:

TEST = 100 * 'ABC ; DEF123,GHI_JKL ; MN OP , '

print(timeit.timeit(lambda: [s.strip() for s in rsplit(TEST)]))
# 130.67168392999884

print(timeit.timeit(lambda: split(TEST, SEPS)))
# 50.31940778599528

这篇关于Python:按分隔符列表拆分字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆