用Python抓取一行的空白/缩进 [英] Grab a line's whitespace/indention with Python

查看:132
本文介绍了用Python抓取一行的空白/缩进的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

基本上,如果我有一排以缩进开头的文本,那么获取该缩进并将其放入Python变量中的最佳方法是什么?例如,如果该行是:

Basically, if I have a line of text which starts with indention, what's the best way to grab that indention and put it into a variable in Python? For example, if the line is:

\t\tthis line has two tabs of indention

然后它将返回'\ t \ t'.或者,如果该行是:

Then it would return '\t\t'. Or, if the line was:

    this line has four spaces of indention

然后它将返回四个空格.

Then it would return four spaces.

所以我猜你可以说我只需要从字符串中去除所有内容,从第一个非空白字符到最后一个字符.有想法吗?

So I guess you could say that I just need to strip everything from a string from first non-whitespace character to the end. Thoughts?

推荐答案

import re
s = "\t\tthis line has two tabs of indention"
re.match(r"\s*", s).group()
// "\t\t"
s = "    this line has four spaces of indention"
re.match(r"\s*", s).group()
// "    "

要删除前导空格,请使用 lstrip .

And to strip leading spaces, use lstrip.

由于可能有很多人质疑正则表达式的效率,所以我进行了一些分析以检查每个案例的效率.

As there are down votes probably questioning the efficiency of regex, I've done some profiling to check the efficiency of each cases.

RegEx> Itertools >> lstrip

RegEx > Itertools >> lstrip

>>> timeit.timeit('r.match(s).group()', 'import re;r=re.compile(r"\s*")s="          hello world!"*10000', number=100000)
0.10037684440612793
>>> timeit.timeit('"".join(itertools.takewhile(lambda x:x.isspace(),s))', 'import itertools;s="          hello world!"*10000', number=100000)
0.7092740535736084
>>> timeit.timeit('"".join(itertools.takewhile(str.isspace,s))', 'import itertools;s="          hello world!"*10000', number=100000)
0.51730513572692871
>>> timeit.timeit('s[:-len(s.lstrip())]', 's="          hello world!"*10000', number=100000)
2.6478431224822998

非常短的字符串,非常短的前导空格

lstrip> RegEx> Itertools

Very short string, very short leading space

lstrip > RegEx > Itertools

如果您可以将字符串的长度限制为不超过千个字符,则lstrip技巧可能会更好.

If you can limit the string's length to thousounds of chars or less, the lstrip trick maybe better.

>>> timeit.timeit('r.match(s).group()', 'import re;r=re.compile(r"\s*");s="          hello world!"*100', number=100000)
0.099548101425170898
>>> timeit.timeit('"".join(itertools.takewhile(str.isspace,s))', 'import itertools;s="          hello world!"*100', number=100000)
0.53602385520935059
>>> timeit.timeit('s[:-len(s.lstrip())]', 's="          hello world!"*100', number=100000)
0.064291000366210938

这显示lstrip技巧的缩放比例大致为O(√n),并且如果前导空格的数量不是很多,则RegEx和itertool方法为O(1).

This shows the lstrip trick scales roughly as O(√n) and the RegEx and itertool methods are O(1) if the number of leading spaces is not a lot.

lstrip >> RegEx >>> Itertools

lstrip >> RegEx >>> Itertools

如果有很多前导空格,请不要使用RegEx.

If there are a lot of leading spaces, don't use RegEx.

>>> timeit.timeit('s[:-len(s.lstrip())]', 's=" "*2000', number=10000)
0.047424077987670898
>>> timeit.timeit('r.match(s).group()', 'import re;r=re.compile(r"\s*");s=" "*2000', number=10000)
0.2433168888092041
>>> timeit.timeit('"".join(itertools.takewhile(str.isspace,s))', 'import itertools;s=" "*2000', number=10000)
3.9949162006378174

很长的字符串,很长的前导空格

lstrip >>> RegEx >>>>>>>> Itertools

Very long string, very long leading space

lstrip >>> RegEx >>>>>>>> Itertools

>>> timeit.timeit('s[:-len(s.lstrip())]', 's=" "*200000', number=10000)
4.2374031543731689
>>> timeit.timeit('r.match(s).group()', 'import re;r=re.compile(r"\s*");s=" "*200000', number=10000)
23.877214908599854
>>> timeit.timeit('"".join(itertools.takewhile(str.isspace,s))', 'import itertools;s=" "*200000', number=100)*100
415.72158336639404

这显示了如果非空间部分不是很多,所有方法的缩放比例大致为O(m).

This shows all methods scales roughly as O(m) if the non-space part is not a lot.

这篇关于用Python抓取一行的空白/缩进的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆